The chores scorer — chores_scorer • choreseval

Pass this function to Task$new() as the scorer to evaluate model responses based on the chores eval criteria.

Usage

chores_scorer(
  samples,
  ...,
  scorer_chat = ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest")
)

Arguments

samples: The samples from a solver task, likely retrieved from task$get_samples().
...: Additional arguments passed to the scoring function.
scorer_chat: An ellmer chat object to use for scoring. Defaults to ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest"); this is the scoring model used in "official" results.

Value

A list with the following components:

score: Numeric vector of scores between 0 and 1, representing the proportion of criteria met.
.scorer_metadata: List containing the prompts used for scoring and the detailed grading results.

See also

chores_dataset for the dataset this scorer evaluates, and chores_task() to combine this scorer with the dataset and solver.