Title slide, reading "Fair machine learning with tidymodels," my name, Simon P. Couch, and my affiliation, Posit PBC. To the right of the text are six hexagonal stickers showing packages from the tidymodels.
ggplot2 line plot with the title 'Predicted versus Actual Vase Weight.' The subtitle reads 'Machine learning model predicts lighter vases are too heavy, and vice versa.'

Is this fair?

The same exact line plot with the title and subtitle switched. They now read 'Predicted vs. Actual Home Value.' and 'Tax assessment model overvalues cheaper homes, and vice versa.'

Is this fair?

Fairness is about our beliefs

The same model parameters can result in behavior that feels totally benign when situated in one context and deeply unjust in another.

A set of hexagonal stickers. The top sticker, which is also larger than the rest, is labeled tidymodels, while the rest contain package names like parsnip and yardstick.
A gif screenshot of a news headline from the Chicago Tribune, titled "An Unfair Burden." A video of a drive through a southside Chicago neighborhood plays in the background.

Defining fairness

The same tax assessment plot from earlier.

Proposal: The most performant model is the most fair model.

Proposal: Performant models are fair

Apply calibration to correct for correlated errors.

Proposal: Performant models are fair

A dot plot of observed vs. predicted home values, except that the predicted values are now centered around the identity line (representing perfect predictions). Errors are i.i.d.

Perhaps this is better?

Proposal: Performant models are fair

Let’s plot the errors:

A plot of the distribution of errors in the previous plot: errors follow a similar distribution across the distribution of property values.

…let’s not forget those y-axis units.

Proposal: Performant models are fair

Let’s plot the errors:

The same plot as before, but the y axis has been labeled as tens of thousands of dollars.

…let’s not forget those y-axis units.

Proposal: Similar percentage error is most fair

Predictions from a similarly well-calibrated model, but whose percentage error is consistent across the outcome distribution—assessments for more expensive homes have larger absolute errors.

Defining fairness



Defining fairness

Definitions of fairness “are not mathematically or morally compatible in general.”\(^1\)

The whole system

The whole system

How will these predictions even be used, though?

  • Fixed Percentage: All properties taxed at .9% assessed value
  • Homeowner Exemption: For those that live in the home they own, first xyz not taxed, then .9% assessed value
  • Transitioning from regressivity: Is a rapid change in assessment unfair?

The whole system

Metrics evaluate the model, but the model is one part of a larger system.

The hard parts

  • Articulating what fairness means to you (or stakeholders) in a problem context
  • Choosing a mathematical measure of fairness that speaks to that meaning
  • Situating the resulting measure in the whole system

Choose tools that support thinking about the hard parts.

Resources

  • tmwr.org

The book cover for "Tidy Modeling with R."

Resources

  • tmwr.org
  • tidymodels.org

Screenshot of an analysis titled "Are GPT detectors fair?"

Resources

  • tmwr.org
  • tidymodels.org

Screenshot of an analysis titled "Are GPT detectors fair?"

Screenshot of an analysis titled "Fair prediction of hospital readmission?"

Resources

  • tmwr.org
  • tidymodels.org
  • Slides and example notebooks:
github.com/simonpcouch/cascadia-24