High-NQ Request Monopolization
warningSearch requests with very large NQ (number of queries per request) monopolize query node resources for extended periods, causing other concurrent requests to queue and experience elevated latency even though per-vector processing time remains normal.
Parse slow query logs for requests with large NQ values but small durationPerNq metrics. Monitor in-queue latency rising across all queries when total request duration is dominated by a single large-NQ request. Alert when queue time exceeds 100ms or when request NQ exceeds a threshold (e.g., >1000 vectors per request).
Implement application-side batching to limit NQ per request (e.g., max 100-500 vectors). Scale out query nodes to handle higher concurrent load. Configure request rate limiting or queue depth limits to prevent monopolization. Consider separate query node pools for small vs. large batch requests.