Model Serving Latency SLA Violations

latency

Model serving endpoint latency (time_ms) exceeding p99 SLA thresholds causes user-facing application slowdowns. Latency violations often correlate with resource constraints (cpu_usage_percentage, mem_usage_percentage, gpu_usage_percentage) or increased request rates.

Databricks insight details requires a free account. Sign in with Google or GitHub to access the full knowledge base.