Title slide, reading "Fair machine learning with tidymodels," my name, Simon P. Couch, and my affiliation, Posit Software, PBC. To the right of the text are six hexagonal stickers showing packages from the tidymodels.

A predictive modeling problem

  • Binary outcome (“yes” or “no”)

  • 100,000 rows, 18 columns

  • Mix of numeric and categorical predictors

How long does it take to tune a boosted tree model on my laptop?

Two modeling approaches

Approach Area under ROC Elapsed time
Default engine + grid search 0.8957 3.68h
Optimized engine + search strategy

Two modeling approaches

Approach Area under ROC Elapsed time
Default engine + grid search 0.8957 3.68h
Optimized engine + search strategy 0.8954 1.52m

Virtually indistinguishable performance in 0.7% of the time.

How did we do it?

Quickly, some background:

How did we do it?

Here’s our tuning process visualized similarly:

Distributing computations

Sequentially:

In parallel:

Distributing computations

In tidymodels, this is one added line of code:


plan(multisession, workers = 4)

Non-default modeling engine

Before:

With a carefully chosen modeling engine:

Non-default modeling engine

In tidymodels, this is one changed line of code. From:

spec <- boost_tree(engine = "xgboost")


To:

spec <- boost_tree(engine = "lightgbm")

Submodel trick

Before:

Fitting a third as many models:

Submodel trick

In tidymodels, this is a few added lines of code:


set.seed(1)
spec_grid <- spec %>%
  extract_parameter_set_dials() %>% 
  grid_regular(levels = 4)


In some cases, this “just works” with no changes.

Racing

Before:

Giving up on poorly performing models early:

Racing

In tidymodels, this is one changed line of code. From:

results <- tune_grid(spec, ...)


To:

results <- tune_race_anova(spec, ...)

Optimizations, altogether

We went from:

To:

Resources

  • tmwr.org

The book cover for "Tidy Modeling with R."

Resources

  • tmwr.org
  • emlwr.org

Resources

  • tmwr.org
  • emlwr.org
  • Slides and resources:

github.com/simonpcouch/rpharma-24