Metadata Operation Latency Cascades from ZooKeeper Contention
warningHigh latency in pulsar_metadata_store_ops_latency_ms_bucket indicates ZooKeeper is struggling with metadata operations (topic creation, subscription management, ownership changes). This blocks broker operations and causes cascading failures across the cluster.
Monitor pulsar_metadata_store_ops_latency_ms_bucket for operations exceeding 100ms. Correlate with ZooKeeper metrics (zkOutstandingRequests, zkNumAliveConnections) to confirm ZooKeeper saturation. Check pulsar_brk_ml_cursor_persistzookeepererrors for acknowledgment state persistence failures falling back to ZooKeeper.
Scale ZooKeeper quorum to handle read load. Enable metadata operation batching to reduce individual operation overhead. Implement ZooKeeper data compression to reduce sync and snapshot times. Ensure managed ledger cache is properly sized to minimize ZooKeeper fallback for cursor persistence. Consider migrating to etcd metadata store for better horizontal scalability.