Latency optimizations for edge inference

edgeX Research · 2025-08-02

How to reduce end-to-end latency for on-device ML: batching, model quantization, and near-source routing.

How to reduce end-to-end latency for on-device ML: batching, model quantization, and near-source routing. Full article and reproducible artifacts available in the docs and repo.