LangChain

Feedback Score Divergence from Online Evaluators

warning
reliabilityUpdated Feb 23, 2026

LangSmith dashboards track user feedback and online evaluator scores separately. If user scores trend negative while evaluator scores remain stable (or vice versa), evaluation criteria may be misaligned with real user needs.

How to detect:

Use LangSmith's feedback score charts (grouped by feedback key) to compare user-submitted feedback vs. online evaluator results. Alert when user scores drop >20% below evaluator scores over a rolling window, or when variance between the two increases significantly.

Recommended action:

Review online evaluator prompts and criteria for alignment with user expectations. Conduct qualitative analysis of traces with divergent scores. Adjust evaluator logic or retrain classifiers. Use LangSmith's annotation queues to manually audit edge cases.