Cold Start Cache Miss Cascade
warningAfter a Milvus restart or query node scaling event, indexes and segments must be loaded from storage into memory, causing dramatically slower queries (seconds vs. milliseconds) until caches warm up, particularly impacting on-disk indexes like DiskANN.
Track query latency immediately following pod restarts or scaling events. Monitor disk I/O saturation metrics and cache hit rates. Alert when P99 latency exceeds normal baseline by >10x within 15 minutes of a restart, or when disk cache load time metrics spike.
Implement pre-warming procedures that load frequently accessed segments before routing production traffic to restarted nodes. Use progressive traffic shifting during scale-out. Consider HNSW or IVF indexes for latency-sensitive workloads instead of DiskANN. Enable MMAP to balance memory usage and cold-start impact.