Self-Hosted Deployment Health Blind Spots
criticalreliabilityUpdated Feb 23, 2026
Self-hosted LangSmith instances experience infrastructure issues (disk space, resource constraints, pod failures) that are not detected through application-level monitoring alone.
Sources
Technologies:
How to detect:
Monitor Kubernetes events, pod status, and ClickHouse disk usage for self-hosted deployments. Track pending_runs metric for queue buildup. Look for ERROR logs in langsmith-backend, langsmith-platform-backend, and langsmith-queue services.
Recommended action:
Implement comprehensive infrastructure monitoring using kubectl describe and log analysis. Set up alerts for ClickHouse disk space exhaustion (NOT_ENOUGH_SPACE errors), failed liveness/readiness probes, and image pull errors. Monitor queue depth via pending_runs metric to detect ingestion bottlenecks.