Clinical care data from 130 U.S. hospitals in years 1999-2008. Each row describes an "encounter" with a patient with diabetes, including variables on demographics, medications, patient history, diagnostics, payment, and readmission.

readmission

Format

A data frame with 71,515 rows and 12 columns:

readmitted

Whether the patient was readmitted within the 30 days following discharge. A factor with levels "Yes" and "No".

race

Reported race of the patient. Source data does not document data collection strategy. A factor with levels "African American", "Asian", "Caucasian", "Hispanic", "Other", and "Unknown".

sex

Reported sex of the patient. Source data does not document data collection strategy. A factor with levels "Female" and "Male".

age

Age range for the patient, binned in 10-year intervals. A factor with levels "[0-10)", "[10-20)", "[20-30)", "[30-40)", "[40-50)", "[50-60)", "[60-70)", "[70-80)", "[80-90)", and "[90-100)".

admission_source

Whether the patient was referred from a physician, admitted via the ER, or arrived via some other source. A factor with levels "Emergency", "Other", and "Referral".

blood_glucose

Results from an A1C test, estimating the patient's average blood sugar over the past 2-3 months. Higher estimated average blood glucose levels are linked to diabetes complications. A factor with levels "Normal", "High", and "Very High", and many missing values.

insurer

The health insurance provider (or lack thereof, via "Self-Pay") for the patient. A factor with levels "Medicaid", "Medicare", "Private", and "Self-Pay", and many missing values.

duration

Number of days in the hospital between admission and discharge.

n_previous_visits

Number of emergency, inpatient, and outpatient visits in the year preceding the encounter.

n_diagnoses

"Number of diagnoses entered to the system" during the encounter.

n_procedures

"Number of procedures (other than lab tests) performed" during the encounter.

n_medications

"Number of distinct generic names administered" during the encounter.

Source

Original source data from the following paper (CC BY 3.0):

Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., & Clore, J. N. 2014. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 781670. doi:10.1155/2014/781670.

Shared freely through the UCI Machine Learning Repository (CC BY 4.0):

Clore, J., Cios, K., DeShazo, J. P., and Strack, B. 2014. Diabetes 130-US hospitals for years 1999-2008. UCI Machine Learning Repository. doi:10.24432/C5230J.

Downloaded from resources shared by the Fairlearn team (MIT):

Weerts, H., Dudík M., Edgar, R., Jalali, A., Lutz, R., & Madaio, M. 2023. Fairlearn: Assessing and Improving Fairness of AI Systems. Journal of Machine Learning Research, 24(257):1-8.

Examples


str(readmission)
#> tibble [71,515 × 12] (S3: tbl_df/tbl/data.frame)
#>  $ readmitted       : Factor w/ 2 levels "Yes","No": 1 2 1 2 2 2 1 2 2 2 ...
#>  $ race             : Factor w/ 6 levels "African American",..: 1 3 3 3 3 3 1 3 3 3 ...
#>  $ sex              : Factor w/ 2 levels "Female","Male": 2 1 1 1 1 2 1 1 2 1 ...
#>  $ age              : Factor w/ 10 levels "[0-10)","[10-20)",..: 7 6 8 9 8 6 8 3 7 9 ...
#>  $ admission_source : Factor w/ 3 levels "Emergency","Other",..: 3 1 3 3 3 1 3 1 2 3 ...
#>  $ blood_glucose    : Factor w/ 3 levels "Normal","High",..: NA 1 NA NA NA 3 NA NA NA NA ...
#>  $ insurer          : Factor w/ 4 levels "Medicaid","Medicare",..: NA 3 2 3 NA NA 3 NA NA 2 ...
#>  $ duration         : num [1:71515] 7 4 5 5 4 2 3 1 12 1 ...
#>  $ n_previous_visits: num [1:71515] 2 0 2 0 0 0 0 7 0 0 ...
#>  $ n_diagnoses      : num [1:71515] 4 9 9 9 5 2 9 9 9 4 ...
#>  $ n_procedures     : num [1:71515] 0 0 0 3 1 3 3 0 2 2 ...
#>  $ n_medications    : num [1:71515] 16 15 14 26 15 25 22 10 17 6 ...

head(readmission)
#> # A tibble: 6 × 12
#>   readmitted race    sex   age   admission_source blood_glucose insurer duration
#>   <fct>      <fct>   <fct> <fct> <fct>            <fct>         <fct>      <dbl>
#> 1 Yes        Africa… Male  [60-… Referral         NA            NA             7
#> 2 No         Caucas… Fema… [50-… Emergency        Normal        Private        4
#> 3 Yes        Caucas… Fema… [70-… Referral         NA            Medica…        5
#> 4 No         Caucas… Fema… [80-… Referral         NA            Private        5
#> 5 No         Caucas… Fema… [70-… Referral         NA            NA             4
#> 6 No         Caucas… Male  [50-… Emergency        Very High     NA             2
#> # ℹ 4 more variables: n_previous_visits <dbl>, n_diagnoses <dbl>,
#> #   n_procedures <dbl>, n_medications <dbl>