The chores evaluation task

Creates a vitals Task for evaluating language models on the chores dataset. The task combines the dataset, solver, and scorer into a single object. Run this eval with chores_task()$eval() and a solver_chat of your choice.

Usage

chores_task(dir = "data-raw/chores/logs")

Arguments

dir: Character string specifying the directory where evaluation logs will be written.

Value

A vitals Task object configured with the chores dataset, solver, and scorer. This object can be used to run evaluations with task$eval().

Usage

Arguments

Value

See also