Get unlimited infrastructure observability context via MCP

Ground your agent engineering
in structured observability context

Get the latest observability docs and guidance for your infrastructure via MCP

Apache Kafka

Apache Kafka is a distributed streaming platform for high-throughput, fault-tolerant event pipelines and stream processing. Official Apache Kafka docs: Apache Kafka 4.1 Documentation

Native Apache KafkaOfficial Apache Kafka metrics docs:

Kafka 4.1 Monitoring - JMX Metrics

DatadogOfficial Datadog Apache Kafka metrics docs: GitHub

Datadog Kafka Integration

OpentelemetryOfficial Opentelemetry Apache Kafka metrics docs: GitHub

OpenTelemetry Kafka Metrics Receiver Documentation

PrometheusOfficial Prometheus Apache Kafka metrics docs: GitHub

Prometheus JMX Exporter

DynatraceOfficial Dynatrace Apache Kafka metrics docs:

Dynatrace Hub - Apache Kafka

Get context for Apache Kafka metrics



kafka.broker.config.default_replication_factorGaugeUnits:itemBroker configuration for default replication factor.
kafka.broker.config.log_retention_bytesGaugeUnits:byteBroker configuration for log retention in bytes.
kafka.broker.config.log_retention_msGaugeUnits:millisecondBroker configuration for log retention in milliseconds.
kafka.broker.config.log_segment_bytesGaugeUnits:byteBroker configuration for log segment size in bytes.
kafka.broker.config.min_insync_replicasGaugeUnits:itemBroker configuration for minimum in-sync replicas.
kafka.broker.config.num_io_threadsGaugeUnits:threadBroker configuration for number of I/O threads.
kafka.broker.config.num_network_threadsGaugeUnits:threadBroker configuration for number of network threads.
kafka.broker.config.num_partitionsGaugeUnits:itemBroker configuration for default number of partitions.
kafka.broker.countGaugeUnits:instanceTotal number of brokers in the cluster.
kafka.broker.leader_countGaugeUnits:itemNumber of partitions for which this broker is the leader.
kafka.broker.partition_countGaugeUnits:itemTotal number of partitions on this broker including replicas.
kafka.broker_offsetGaugeUnits:offsetCurrent message offset on broker.
kafka.brokers Sum (Non-Monotonic)Units:{brokers}Number of brokers in the cluster.
kafka.cluster.controller_idGaugeUnits:instanceID of the broker acting as the cluster controller.
kafka.connect.connect-metrics.incoming-byte-rateGaugeUnits:bytes_per_secondBytes/second read off all sockets
kafka.connect.connect-metrics.outgoing-byte-rateGaugeUnits:bytes_per_secondThe average number of outgoing bytes sent per second to all servers
kafka.connect.connect-metrics.request-size-avgGaugeUnits:bytesThe average size of all requests in the window
kafka.connector.statusGaugeUnits:countEquals 1 if the status is running, 0 otherwise
kafka.connector.task.offset-commit-avg-time-msGaugeUnits:millisecondsAverage time in milliseconds taken by this task to commit offsets
kafka.connector.task.offset-commit-failure-percentageGaugeUnits:percentAverage percentage of this task's offset commit attempts that failed
kafka.connector.task.offset-commit-max-time-msGaugeUnits:millisecondsMaximum time in milliseconds taken by this task to commit offsets
kafka.connector.task.offset-commit-success-percentageGaugeUnits:percentAverage percentage of this task's offset commit attempts that succeeded
kafka.connector.task.pause-ratioGaugeUnits:countThe fraction of time this task has spent in the pause state
kafka.connector.task.running-ratioGaugeUnits:countThe fraction of time this task has spent in the running state
kafka.connector.task.statusGaugeUnits:countEquals 1 if the task status is running, 0 otherwise

Full metrics context for Apache Kafka is available with an account. Request access

Understanding Apache Kafka observability

Apache Kafka's observability landscape is uniquely complex due to its distributed architecture, which separates brokers, producers, consumers, and ZooKeeper (or KRaft) into distinct components that each expose their own telemetry. Unlike simpler message queue systems, Kafka's JMX (Java Management Extensions) interface serves as the primary telemetry source, exposing hundreds of metrics across multiple layers of the stack. This JMX-based approach means monitoring typically requires exporters like Prometheus JMX Exporter or agents from vendors like Datadog and Dynatrace to transform these metrics into observable signals. The OpenTelemetry Collector has also emerged as a vendor-neutral option for Kafka metric collection, though implementation patterns vary significantly across observability backends.

Key Use Cases

Resolving Lambda Timeout Issues with Kafka Event Source Mappings →

Diagnose and fix timeout problems when Lambda functions consume from Kafka topics, with specific guidance on ESM configuration and batch processing tuning.

Optimizing Kafka Polling Scales for Cost and Performance →

Understand the different scaling modes available for Apache Kafka event pollers in Lambda and select the optimal configuration based on throughput requirements and cost constraints.

Implementing Effective CloudWatch Alerting for Kafka-Lambda Pipelines →

Set up intelligent alerts using Lambda and Kafka-specific metrics with recommended thresholds that reduce noise while catching real issues early.

Capacity Planning for Event-Driven Workloads →

Calculate appropriate Lambda concurrency limits, memory allocations, and Kafka partition counts based on message volume and processing requirements.

Performance Tuning Event Processing Throughput →

Apply specific optimizations to reduce latency and increase throughput in Lambda functions consuming from Kafka, including batch size tuning and parallel processing strategies.

Ground your agent engineeringin structured observability context

Apache Kafka

Get context for Apache Kafka metrics

Understanding Apache Kafka observability

Key Use Cases

Ground your agent engineering
in structured observability context