torch.compile compilation cache prevents slow cold starts

warning

performanceUpdated Mar 24, 2026

Sources

torch.compile integration - vLLMdocs.vllm.ai

Technologies:

BentoMLsubject

vLLMSymptoms of this issue are visible in vLLM metrics and logs

How to detect:

Without a pre-populated compilation cache, vLLM instances experience slow startup due to torch.compile compilation overhead. First-time compilation generates artifacts that can take seconds to minutes depending on model size and batch configuration.

Recommended action:

Copy the entire ~/.cache/vllm/torch_compile_cache directory from a previously compiled instance to deployment targets. This bypasses compilation entirely on subsequent starts. To disable cache for debugging, set VLLM_DISABLE_COMPILE_CACHE=1. For debugging compiled code, set VLLM_COMPILE_CACHE_SAVE_FORMAT=unpacked or use compile_cache_save_format=unpacked in compilation config.