Auto-tuning disabled by default causes suboptimal kernel performance

info

performanceUpdated Mar 24, 2026

Sources

torch.compile integration - vLLMdocs.vllm.ai

Technologies:

BentoMLsubject

vLLMSymptoms of this issue are visible in vLLM metrics and logs

How to detect:

When compiling for specific batch sizes with compile_sizes, vLLM can auto-tune Triton kernels for maximum performance. However, auto-tuning is disabled by default because it takes seconds to minutes, causing slower first-time compilation. This means production deployments run with non-optimal kernel configurations.

Recommended action:

For maximum performance, enable compilation for specific batch sizes using --compilation_config '{"compile_sizes": [1, 2, 4, 8]}' and accept the longer initial compilation time. Results are cached for subsequent runs. The auto-tuning benchmarks different kernel configurations and can find significantly faster alternatives than default implementations.