NCCL Collective Performance Degradation Signals Network Issues
latency
When NCCL Inspector shows decreased algorithmic or bus bandwidth for collectives (AllReduce, AllGather, ReduceScatter), it indicates network congestion, misconfiguration, or hardware issues affecting multi-GPU/multi-node AI workloads.
NVIDIA Networking insight details requires a free account. Sign in with Google or GitHub to access the full knowledge base.
Sign in to access