Latency optimizations for edge inference
How to reduce end-to-end latency for on-device ML: batching, model quantization, and near-source routing.
How to reduce end-to-end latency for on-device ML: batching, model quantization, and near-source routing. Full article and reproducible artifacts available in the docs and repo.