Get unlimited infrastructure observability context via MCP

Ground your agent engineering
in structured observability context

Get the latest observability docs and guidance for your infrastructure via MCP

Apache Kafka
Apache Kafka
Versions: [4.1]171 metrics4 documents

Apache Kafka is a distributed streaming platform for high-throughput, fault-tolerant event pipelines and stream processing. Official Apache Kafka docs: Apache Kafka 4.1 Documentation

Native Apache KafkaNative Apache KafkaOfficial Apache Kafka metrics docs:Kafka 4.1 Monitoring - JMX Metrics
DatadogDatadogOfficial Datadog Apache Kafka metrics docs:GitHubDatadog Kafka Integration
OpentelemetryOpentelemetryOfficial Opentelemetry Apache Kafka metrics docs:GitHubOpenTelemetry Kafka Metrics Receiver Documentation
PrometheusPrometheusOfficial Prometheus Apache Kafka metrics docs:GitHubPrometheus JMX Exporter
DynatraceDynatraceOfficial Dynatrace Apache Kafka metrics docs:Dynatrace Hub - Apache Kafka

Get context for Apache Kafka metrics

Apache Kafkakafka.broker.config.default_replication_factorDatadogGaugeUnits:itemBroker configuration for default replication factor.Documentation
Apache Kafkakafka.broker.config.log_retention_bytesDatadogGaugeUnits:byteBroker configuration for log retention in bytes.Documentation
Apache Kafkakafka.broker.config.log_retention_msDatadogGaugeUnits:millisecondBroker configuration for log retention in milliseconds.Documentation
Apache Kafkakafka.broker.config.log_segment_bytesDatadogGaugeUnits:byteBroker configuration for log segment size in bytes.Documentation
Apache Kafkakafka.broker.config.min_insync_replicasDatadogGaugeUnits:itemBroker configuration for minimum in-sync replicas.Documentation
Apache Kafkakafka.broker.config.num_io_threadsDatadogGaugeUnits:threadBroker configuration for number of I/O threads.Documentation
Apache Kafkakafka.broker.config.num_network_threadsDatadogGaugeUnits:threadBroker configuration for number of network threads.Documentation
Apache Kafkakafka.broker.config.num_partitionsDatadogGaugeUnits:itemBroker configuration for default number of partitions.Documentation
Apache Kafkakafka.broker.countDatadogGaugeUnits:instanceTotal number of brokers in the cluster.Documentation
Apache Kafkakafka.broker.leader_countDatadogGaugeUnits:itemNumber of partitions for which this broker is the leader.Documentation
Apache Kafkakafka.broker.partition_countDatadogGaugeUnits:itemTotal number of partitions on this broker including replicas.Documentation
Apache Kafkakafka.broker_offsetDatadogGaugeUnits:offsetCurrent message offset on broker.Documentation
Apache Kafkakafka.brokersOpentelemetryPrometheusSum (Non-Monotonic)Units:{brokers}Number of brokers in the cluster.Documentation
Apache Kafkakafka.cluster.controller_idDatadogGaugeUnits:instanceID of the broker acting as the cluster controller.Documentation
Apache Kafkakafka.connect.connect-metrics.incoming-byte-rateDynatraceGaugeUnits:bytes_per_secondBytes/second read off all socketsDocumentation
Apache Kafkakafka.connect.connect-metrics.outgoing-byte-rateDynatraceGaugeUnits:bytes_per_secondThe average number of outgoing bytes sent per second to all serversDocumentation
Apache Kafkakafka.connect.connect-metrics.request-size-avgDynatraceGaugeUnits:bytesThe average size of all requests in the windowDocumentation
Apache Kafkakafka.connector.statusDynatraceGaugeUnits:countEquals 1 if the status is running, 0 otherwiseDocumentation
Apache Kafkakafka.connector.task.offset-commit-avg-time-msDynatraceGaugeUnits:millisecondsAverage time in milliseconds taken by this task to commit offsetsDocumentation
Apache Kafkakafka.connector.task.offset-commit-failure-percentageDynatraceGaugeUnits:percentAverage percentage of this task's offset commit attempts that failedDocumentation
Apache Kafkakafka.connector.task.offset-commit-max-time-msDynatraceGaugeUnits:millisecondsMaximum time in milliseconds taken by this task to commit offsetsDocumentation
Apache Kafkakafka.connector.task.offset-commit-success-percentageDynatraceGaugeUnits:percentAverage percentage of this task's offset commit attempts that succeededDocumentation
Apache Kafkakafka.connector.task.pause-ratioDynatraceGaugeUnits:countThe fraction of time this task has spent in the pause stateDocumentation
Apache Kafkakafka.connector.task.running-ratioDynatraceGaugeUnits:countThe fraction of time this task has spent in the running stateDocumentation
Apache Kafkakafka.connector.task.statusDynatraceGaugeUnits:countEquals 1 if the task status is running, 0 otherwiseDocumentation
Full metrics context for Apache Kafka is available with an account. Request access

Understanding Apache Kafka observability

Apache Kafka's observability landscape is uniquely complex due to its distributed architecture, which separates brokers, producers, consumers, and ZooKeeper (or KRaft) into distinct components that each expose their own telemetry. Unlike simpler message queue systems, Kafka's JMX (Java Management Extensions) interface serves as the primary telemetry source, exposing hundreds of metrics across multiple layers of the stack. This JMX-based approach means monitoring typically requires exporters like Prometheus JMX Exporter or agents from vendors like Datadog and Dynatrace to transform these metrics into observable signals. The OpenTelemetry Collector has also emerged as a vendor-neutral option for Kafka metric collection, though implementation patterns vary significantly across observability backends.

Key Use Cases

Resolving Lambda Timeout Issues with Kafka Event Source Mappings
Diagnose and fix timeout problems when Lambda functions consume from Kafka topics, with specific guidance on ESM configuration and batch processing tuning.
Optimizing Kafka Polling Scales for Cost and Performance
Understand the different scaling modes available for Apache Kafka event pollers in Lambda and select the optimal configuration based on throughput requirements and cost constraints.
Implementing Effective CloudWatch Alerting for Kafka-Lambda Pipelines
Set up intelligent alerts using Lambda and Kafka-specific metrics with recommended thresholds that reduce noise while catching real issues early.
Capacity Planning for Event-Driven Workloads
Calculate appropriate Lambda concurrency limits, memory allocations, and Kafka partition counts based on message volume and processing requirements.
Performance Tuning Event Processing Throughput
Apply specific optimizations to reduce latency and increase throughput in Lambda functions consuming from Kafka, including batch size tuning and parallel processing strategies.