Is this fair?
Is this fair?
Corollary: machine learning fairness is not simply a mathematical optimization problem
library(detectors)
str(detectors)
#> tibble [6,185 × 9] (S3: tbl_df/tbl/data.frame)
#> $ kind : Factor w/ 2 levels "AI","Human": 2 2 2 1 1 2 1 1 2 2 ...
#> $ .pred_AI : num [1:6185] 0.999994 0.828145 0.000214 0 0.001784 ...
#> $ .pred_class: Factor w/ 2 levels "AI","Human": 1 1 2 2 2 2 1 2 2 1 ...
#> $ detector : chr [1:6185] "Sapling" "Crossplag" "Crossplag" "ZeroGPT" ...
#> $ native : chr [1:6185] "No" "No" "Yes" NA ...
#> $ name : chr [1:6185] "Real TOEFL" "Real TOEFL" "Real College Essays" "Fake CS224N - GPT3" ...
#> $ model : chr [1:6185] "Human" "Human" "Human" "GPT3" ...
#> $ document_id: num [1:6185] 497 278 294 671 717 855 533 484 781 460 ...
#> $ prompt : chr [1:6185] NA NA NA "Plain" ...
How does a GPT detector behave fairly?
Three perspectives:
Position: it is unfair to pass on an essay written by a GPT as one’s own work.
Stakeholders:
detectors %>%
group_by(detector) %>%
roc_auc(truth = kind, .pred_AI) %>%
arrange(desc(.estimate)) %>%
head(3)
#> # A tibble: 3 × 4
#> detector .metric .estimator .estimate
#> <chr> <chr> <chr> <dbl>
#> 1 GPTZero roc_auc binary 0.750
#> 2 OriginalityAI roc_auc binary 0.682
#> 3 HFOpenAI roc_auc binary 0.614
Note
This code makes no mention of the native
variable.
Position: it is unfair to disproportionately classify human-written text as AI-generated
Stakeholders:
The fairness metric equal opportunity quantifies this definition of fairness.
Note
equal_opportunity()
is one of several fairness metrics in the developmental version of yardstick.
detectors %>%
filter(kind == "Human") %>%
group_by(detector) %>%
equal_opportunity_by_native(
truth = kind, estimate = .pred_class, event_level = "second"
) %>%
arrange(.estimate) %>%
head(3)
#> # A tibble: 3 × 5
#> detector .metric .by .estimator .estimate
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Crossplag equal_opportunity native binary 0.464
#> 2 ZeroGPT equal_opportunity native binary 0.477
#> 3 GPTZero equal_opportunity native binary 0.510
The detectors with estimates closest to zero are most fair, by this definition of fairness.
Position: it is unfair to pass on an essay written by a GPT as one’s own work and it is unfair to disportionately classify human-written text as AI-generated.
Stakeholders:
Workflow:
Question
By this workflow, which of the first definitions of fairness is encoded as more important?
Find the most performant detectors:
Among the most performant detectors, choose the model that predicts most fairly on human-written essays:
detectors %>%
filter(kind == "Human", detector %in% performant_detectors$detector) %>%
group_by(detector) %>%
equal_opportunity_by_native(
truth = kind,
estimate = .pred_class,
event_level = "second"
) %>%
arrange(.estimate)
#> # A tibble: 3 × 5
#> detector .metric .by .estimator .estimate
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 GPTZero equal_opportunity native binary 0.510
#> 2 HFOpenAI equal_opportunity native binary 0.549
#> 3 OriginalityAI equal_opportunity native binary 0.709
Take-home📝
Switch the order of these steps. Does this result in a different set of recommended models?
github.com/simonpcouch/slc-rug-23