Milvus

Query Node CPU Spike During Search-Phase Transition

warning
Resource ContentionUpdated Nov 3, 2025

When Milvus transitions from index building to concurrent searches, CPU usage spikes significantly (from ~21 cores to 28 cores peak), creating a temporary bottleneck that queues subsequent requests and increases in-queue latency.

How to detect:

Monitor CPU core utilization across query nodes during workload transitions. Watch for sharp spikes in CPU usage (>30% increase) when concurrent search load begins, accompanied by rising in-queue latency metrics. This pattern appears as a distinct peak in CPU metrics at the boundary between indexing and search phases.

Recommended action:

Scale out query nodes to handle transition load spikes, or implement request rate limiting during known transition periods. Pre-warm query nodes before switching workloads. Consider dedicating separate query nodes for search vs. indexing operations to avoid resource contention.