When Milvus transitions from index building to concurrent searches, CPU usage spikes significantly (from ~21 cores to 28 cores peak), creating a temporary bottleneck that queues subsequent requests and increases in-queue latency.
After a Milvus restart or query node scaling event, indexes and segments must be loaded from storage into memory, causing dramatically slower queries (seconds vs. milliseconds) until caches warm up, particularly impacting on-disk indexes like DiskANN.
High-frequency upsert operations generate many small, unindexed segments that force query nodes to scan raw data instead of using optimized indexes, dramatically increasing vector search latency and CPU usage until compaction completes.
Index building and optimization phases consume peak memory (up to 6.6GB observed), and when combined with concurrent search load or insufficient memory allocation, can trigger OOM conditions or force excessive disk I/O via MMAP, degrading performance.
Search requests with inefficient filter expressions or missing scalar indexes trigger full collection scans instead of targeted subset searches, causing scalar filter latency to dominate total query time and dramatically reducing throughput.
Search requests with very large NQ (number of queries per request) monopolize query node resources for extended periods, causing other concurrent requests to queue and experience elevated latency even though per-vector processing time remains normal.
When strict (Strong) consistency is enabled, queries must wait for tSafe (time-safe) synchronization across all nodes, adding significant latency overhead especially in distributed deployments or during high write throughput periods.
When etcd pods crash or enter crash-loop states due to data corruption, PVC issues, or member ID problems, Milvus loses its metadata store, causing all coordinator components to fail and bringing down the entire cluster.