LangSmithKubernetesClickHouse

Self-Hosted Deployment Health Blind Spots

critical
reliabilityUpdated Feb 23, 2026

Self-hosted LangSmith instances experience infrastructure issues (disk space, resource constraints, pod failures) that are not detected through application-level monitoring alone.

How to detect:

Monitor Kubernetes events, pod status, and ClickHouse disk usage for self-hosted deployments. Track pending_runs metric for queue buildup. Look for ERROR logs in langsmith-backend, langsmith-platform-backend, and langsmith-queue services.

Recommended action:

Implement comprehensive infrastructure monitoring using kubectl describe and log analysis. Set up alerts for ClickHouse disk space exhaustion (NOT_ENOUGH_SPACE errors), failed liveness/readiness probes, and image pull errors. Monitor queue depth via pending_runs metric to detect ingestion bottlenecks.