Pass this function to Task$new()
as the scorer to evaluate model responses
based on the chores eval criteria.
Usage
chores_scorer(
samples,
...,
scorer_chat = ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest")
)
Arguments
- samples
The samples from a solver task, likely retrieved from
task$get_samples()
.- ...
Additional arguments passed to the scoring function.
- scorer_chat
An ellmer chat object to use for scoring. Defaults to
ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest")
; this is the scoring model used in "official" results.
Value
A list with the following components:
- score
Numeric vector of scores between 0 and 1, representing the proportion of criteria met.
- .scorer_metadata
List containing the prompts used for scoring and the detailed grading results.
See also
chores_dataset for the dataset this scorer evaluates, and
chores_task()
to combine this scorer with the dataset and solver.