Auto-tuning disabled by default causes suboptimal kernel performance
infoperformanceUpdated Mar 24, 2026
How to detect:
When compiling for specific batch sizes with compile_sizes, vLLM can auto-tune Triton kernels for maximum performance. However, auto-tuning is disabled by default because it takes seconds to minutes, causing slower first-time compilation. This means production deployments run with non-optimal kernel configurations.
Recommended action:
For maximum performance, enable compilation for specific batch sizes using --compilation_config '{"compile_sizes": [1, 2, 4, 8]}' and accept the longer initial compilation time. Results are cached for subsequent runs. The auto-tuning benchmarks different kernel configurations and can find significantly faster alternatives than default implementations.