Feedback Score Divergence from Online Evaluators

warning

reliabilityUpdated Feb 23, 2026

LangSmith dashboards track user feedback and online evaluator scores separately. If user scores trend negative while evaluator scores remain stable (or vice versa), evaluation criteria may be misaligned with real user needs.

Sources

LangSmith Observability - Docs by LangChaindocs.langchain.com

Alerts in LangSmith - Docs by LangChaindocs.langchain.com

Technologies:

LangChainSymptoms of this issue are visible in LangChain metrics and logs

How to detect:

Use LangSmith's feedback score charts (grouped by feedback key) to compare user-submitted feedback vs. online evaluator results. Alert when user scores drop >20% below evaluator scores over a rolling window, or when variance between the two increases significantly.

Recommended action:

Review online evaluator prompts and criteria for alignment with user expectations. Conduct qualitative analysis of traces with divergent scores. Adjust evaluator logic or retrain classifiers. Use LangSmith's annotation queues to manually audit edge cases.