GPU Power Throttling Under Load
Resource Contention
When GPU power usage approaches or exceeds the configured power limit, the GPU throttles clock speeds to stay within power constraints, degrading inference performance. Power throttling reduces SM and memory clock frequencies, directly impacting throughput and latency. This is a hardware-level performance constraint that manifests as reduced GPU efficiency.
Nvidia Triton insight details requires a free account. Sign in with Google or GitHub to access the full knowledge base.
Sign in to access