DATE:
AUTHOR:
The LangChain Team
v0.12.7
LangSmith

Evaluate end-to-end agent interactions with Multi-turn Evals

DATE:
AUTHOR: The LangChain Team

Today, we're launching Multi-turn Evals to help you measure how your agent is performing across an entire end-to-end interaction.

Multi-turn evals are online evaluations that let you measure things like:

  • Semantic intent: What the user was actually trying to do.

  • Semantic outcomes: Whether the task was completed (and if not, why).

  • Agent trajectory: How the interaction unfolded, including tool calls and decisions made along the way.

Multi-turn exchanges between users and agents are represented using threads in LangSmith. If you’re already using threads, getting started is simple. Visit our docs to get started and learn more in the video demo.

Powered by LaunchNotes