BentoMLvLLM

Auto-tuning disabled by default causes suboptimal kernel performance

info
performanceUpdated Mar 24, 2026
How to detect:

When compiling for specific batch sizes with compile_sizes, vLLM can auto-tune Triton kernels for maximum performance. However, auto-tuning is disabled by default because it takes seconds to minutes, causing slower first-time compilation. This means production deployments run with non-optimal kernel configurations.

Recommended action:

For maximum performance, enable compilation for specific batch sizes using --compilation_config '{"compile_sizes": [1, 2, 4, 8]}' and accept the longer initial compilation time. Results are cached for subsequent runs. The auto-tuning benchmarks different kernel configurations and can find significantly faster alternatives than default implementations.