Feedback Score Divergence from Online Evaluators
warningreliabilityUpdated Feb 23, 2026
LangSmith dashboards track user feedback and online evaluator scores separately. If user scores trend negative while evaluator scores remain stable (or vice versa), evaluation criteria may be misaligned with real user needs.
Sources
How to detect:
Use LangSmith's feedback score charts (grouped by feedback key) to compare user-submitted feedback vs. online evaluator results. Alert when user scores drop >20% below evaluator scores over a rolling window, or when variance between the two increases significantly.
Recommended action:
Review online evaluator prompts and criteria for alignment with user expectations. Conduct qualitative analysis of traces with divergent scores. Adjust evaluator logic or retrain classifiers. Use LangSmith's annotation queues to manually audit edge cases.