Skip to contents

Creates a vitals Task for evaluating language models on the chores dataset. The task combines the dataset, solver, and scorer into a single object. Run this eval with chores_task()$eval() and a solver_chat of your choice.

Usage

chores_task(dir = "data-raw/chores/logs")

Arguments

dir

Character string specifying the directory where evaluation logs will be written.

Value

A vitals Task object configured with the chores dataset, solver, and scorer. This object can be used to run evaluations with task$eval().

See also

chores_dataset for the dataset used in this task, chores_solver for the solver function, and chores_scorer for the scoring function.