Multi-Turn Session Context Drift
warningIn conversational AI applications, agent performance may degrade across multiple turns as context drifts, memory becomes incoherent, or earlier information is forgotten. Individual traces appear healthy, but the full conversation reveals deteriorating quality.
Group traces into sessions using session IDs and monitor session-level evaluation metrics. Look for sessions where annotation scores decline over time, where agents fail to reference earlier context, or where response quality degrades after a certain number of turns.
Implement session-level evaluations to measure coherence and goal achievement across full conversations. Use Phoenix's session grouping to identify where conversations break down. Review memory management, context window usage, and conversation summarization strategies. Consider implementing context compression or selective memory retention for long sessions.