Milvus

Scalar Filter Full-Scan Latency Explosion

critical
latencyUpdated Oct 2, 2025

Search requests with inefficient filter expressions or missing scalar indexes trigger full collection scans instead of targeted subset searches, causing scalar filter latency to dominate total query time and dramatically reducing throughput.

How to detect:

Monitor the ratio of scalar filter latency to total search latency. Alert when scalar filter operations consume >40% of search time, or when filter latency suddenly increases while vector search latency remains stable. Check for queries with complex OR chains or JSON field filters without proper indexes.

Recommended action:

Create scalar indexes on all fields used in filter expressions. Replace long OR chains with IN expressions. Enable filter expression templating to reduce parsing overhead. For JSON fields, implement path and flat indexes introduced in Milvus 2.6. Simplify complex filter expressions at the application layer.