40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
-
Updated
Nov 14, 2025 - Python
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
This repo focuses on latency-aware resource optimization for Kubernetes
XRDrone streams live drone video to VR, runs real-time object detection, and overlays visual effects in 3D.
A distributed Java system that dynamically allocates computational tasks based on real-time latency and client performance, using AVL-based scheduling, RSA-secured communication, and asynchronous task execution to boost efficiency in mid-scale clusters.
Request hedging for tail latency reduction in distributed systems
Add a description, image, and links to the latency-optimization topic page so that developers can more easily learn about it.
To associate your repository with the latency-optimization topic, visit your repo's landing page and select "manage topics."