Skip to contents

Pass this function to Task$new() as the scorer to evaluate model responses based on the chores eval criteria.

Usage

chores_scorer(
  samples,
  ...,
  scorer_chat = ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest")
)

Arguments

samples

The samples from a solver task, likely retrieved from task$get_samples().

...

Additional arguments passed to the scoring function.

scorer_chat

An ellmer chat object to use for scoring. Defaults to ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest"); this is the scoring model used in "official" results.

Value

A list with the following components:

score

Numeric vector of scores between 0 and 1, representing the proportion of criteria met.

.scorer_metadata

List containing the prompts used for scoring and the detailed grading results.

See also

chores_dataset for the dataset this scorer evaluates, and chores_task() to combine this scorer with the dataset and solver.