torch.compile compilation cache prevents slow cold starts
warningperformanceUpdated Mar 24, 2026
How to detect:
Without a pre-populated compilation cache, vLLM instances experience slow startup due to torch.compile compilation overhead. First-time compilation generates artifacts that can take seconds to minutes depending on model size and batch configuration.
Recommended action:
Copy the entire ~/.cache/vllm/torch_compile_cache directory from a previously compiled instance to deployment targets. This bypasses compilation entirely on subsequent starts. To disable cache for debugging, set VLLM_DISABLE_COMPILE_CACHE=1. For debugging compiled code, set VLLM_COMPILE_CACHE_SAVE_FORMAT=unpacked or use compile_cache_save_format=unpacked in compilation config.