Model Serving Latency SLA Violations
latency
Model serving endpoint latency (time_ms) exceeding p99 SLA thresholds causes user-facing application slowdowns. Latency violations often correlate with resource constraints (cpu_usage_percentage, mem_usage_percentage, gpu_usage_percentage) or increased request rates.
Databricks insight details requires a free account. Sign in with Google or GitHub to access the full knowledge base.
Sign in to access