index

A predictive modeling problem

How long does it take to tune a boosted tree model on my laptop?

Approach	Area under ROC	Elapsed time
Default engine + grid search	0.8957	3.68h
Optimized engine + search strategy

Approach	Area under ROC	Elapsed time
Default engine + grid search	0.8957	3.68h
Optimized engine + search strategy	0.8954	1.52m

Virtually indistinguishable performance in 0.7% of the time.

Quickly, some background:

Here’s our tuning process visualized similarly:

Sequentially:

In parallel:

In tidymodels, this is one added line of code:

plan(multisession, workers = 4)

Before:

With a carefully chosen modeling engine:

In tidymodels, this is one changed line of code. From:

spec <- boost_tree(engine = "xgboost")

To:

spec <- boost_tree(engine = "lightgbm")

Before:

Fitting a third as many models:

In tidymodels, this is a few added lines of code:

set.seed(1)
spec_grid <- spec %>%
  extract_parameter_set_dials() %>% 
  grid_regular(levels = 4)

In some cases, this “just works” with no changes.

Before:

Giving up on poorly performing models early:

In tidymodels, this is one changed line of code. From:

results <- tune_grid(spec, ...)

To:

results <- tune_race_anova(spec, ...)

We went from:

To:

The book cover for "Tidy Modeling with R."

github.com/simonpcouch/rpharma-24