Evaluation quickstart

Evaluations are a quantitative way to measure the performance of LLM applications. LLMs can behave unpredictably, even small changes to prompts, models, or inputs can significantly affect results. Evaluations provide a structured way to identify failures, compare versions, and build more reliable AI applications. Running an evaluation in LangSmith requires three key components:

Dataset: A set of test inputs (and optionally, expected outputs).
Target function: The part of your application you want to test—this might be a single LLM call with a new prompt, one module, or your entire workflow.
Evaluators: Functions that score your target function’s outputs.

This quickstart guides you through running a starter evaluation that checks the correctness of LLM responses, using either the LangSmith SDK or UI.

If you prefer to watch a video on getting started with tracing, refer to the datasets and evaluations Video guide.

Prerequisites

Before you begin, make sure you have:

A LangSmith account: Sign up or log in at smith.langchain.com.
A LangSmith API key: Follow the Create an API key guide.
An OpenAI API key: Generate this from the OpenAI dashboard.

Select the UI or SDK filter for instructions:

1. Set workspace secrets

In the LangSmith UI for your workspace, ensure that your OpenAI API key is set as a workspace.

Navigate to Settings and then move to the Secrets tab.
On the Workspace Secrets page, select Add secret and enter the OPENAI_API_KEY and your API key as the Value.
Select Save secret.

When adding workspace secrets in the LangSmith UI, make sure the secret keys match the environment variable names expected by your model provider.

2. Create a prompt

LangSmith’s Prompt Playground makes it possible to run evaluations over different prompts, new models, or test different model configurations.

In the LangSmith UI, navigate to the Playground under Prompt Engineering.
Under the Prompts panel. Modify the SYSTEM prompt to:
```
Answer the following question accurately:
```

3. Create a dataset

Click Set up Evaluation, which will open a New Experiment table at the bottom of the page.
In the Select or create a new dataset dropdown, click the + New button to create a new dataset.

Add the following examples to the dataset:

Inputs	Reference Outputs
question: Which country is Mount Kilimanjaro located in?	output: Mount Kilimanjaro is located in Tanzania.
question: What is Earth’s lowest point?	output: Earth’s lowest point is The Dead Sea.

Click Save and enter a name to save your newly created dataset.

4. Add an evaluator

Click + Evaluator and select Correctness from the Pre-built Evaluator options.
In the Correctness panel, click Save.

5. Run your evaluation

Select Start on the top right to run your evaluation. This will create an experiment that you can view in full by clicking the experiment name.

Next steps

To learn more about running experiments in LangSmith, read the evaluation conceptual guide.

For more details on evaluations, refer to the Evaluation documentation.
Learn how to create and manage datasets in the UI.
Learn how to run an evaluation from the prompt playground.

Quickstarts

API & SDKs

Pricing

Prerequisites

1. Set workspace secrets

2. Create a prompt

3. Create a dataset

4. Add an evaluator

5. Run your evaluation

Next steps

Video guide

Quickstarts

API & SDKs

Pricing

​Prerequisites

​1. Set workspace secrets

​2. Create a prompt

​3. Create a dataset

​4. Add an evaluator

​5. Run your evaluation

​Next steps

​Video guide

Prerequisites

1. Set workspace secrets

2. Create a prompt

3. Create a dataset

4. Add an evaluator

5. Run your evaluation

Next steps

Video guide