Frequent hard commits ensure data durability but open new searchers that spike memory usage and can degrade performance. Infrequent commits risk data loss but improve throughput. Commit timing directly impacts query latency during indexing.
Oversized or undersized caches (filterCache, queryResultCache, documentCache) waste memory or reduce hit rates, leading to OOM errors or degraded query performance. Low cache hit ratios indicate wasted memory allocation.
Sudden drops in cache hit ratios (filterCache, queryResultCache, documentCache) precede query performance degradation. This indicates cache invalidation events, configuration changes, or query pattern shifts that require investigation.
Simple, inexpensive queries (e.g., single-term searches) become slow when CPU is saturated by other operations like indexing spikes, causing queries that should complete in <100ms to take 400ms+.
Long or frequent garbage collection pauses directly cause query timeouts and latency spikes, particularly when heap usage remains consistently high (>80%), creating a cascading effect on query performance.
Breaking down query time by component (QParser, FacetComponent, HighlightComponent) reveals which specific operation is expensive, enabling targeted optimization rather than generic query tuning.
Slow indexing rates accompanied by high disk I/O utilization indicate storage bottleneck. Merge operations compete with indexing writes for disk bandwidth, creating throughput ceiling and potential query latency during merges.
Fields used for faceting, sorting, or grouping without docValues=true consume excessive JVM heap via fieldCache, particularly for high-cardinality fields, leading to memory pressure and potential OOM failures.
Queries with large 'start' parameters (deep pagination) or requesting excessive rows force Solr to allocate memory for all results up to that offset, causing unexpected memory exhaustion even for simple queries.
A single failed or unresponsive shard in a distributed collection causes cascading timeouts across the cluster as queries attempt to reach all shards, eventually making other nodes unresponsive even though they are healthy.