🍐 Pairwise Evaluations in LangSmith

DATE: May 15, 2024

AUTHOR: The LangChain Team

For LLM use cases like text generation or chat (where there may not be a single "correct" answer), picking a preferred response with pairwise evaluation can be an effective approach.

LangSmith’s pairwise evaluation lets you (1) define a custom pairwise LLM-as-judge evaluator with any desired criteria and (2) compare two LLM generations using this evaluator.

Read the blog post to learn more about pairwise evaluations
Dive into our video tutorial to walk through an example of how to use custom pairwise evaluators in LangSmith
Check out the docs

Bonus: Need to backtest on your production logs? This video shows how pairwise evaluation can also help you compare different versions of your app runs to the baseline production app.