Observational Fairness and the COMPAS dataset

About the Mini-Project

This mini-project is to be completed individually. That is becase we want all students in the class to have hands-on experience with PyTorch early in the semester, and to make sure that everyone in the class gets hands-on experience with writing code for dealing with messy data.

You must use PyTorch when developing the code for this mini-project. You may use scikit-learn to compute quantities such as false-positive rates, though that is neither necessary nor encouraged.

In this mini-project, you are reproducing results from two scientific papers. There will be multiple available reasonable choices for you to make as you interpret the papers – that is normal.

Plagiarism warning

Your submissions will be checked for plagiarism. You may discuss general issues with your classmates, but you may not share your code, or look at other people’s code. Please avoid uploading your code publically to Github (and similar services)

What to submit

You will submit a Jupyter notebook, with all your code. The results you report need to be reproducible – every figure/number you report should be computed by the code in the Jupyter notebook you submit.

The Dataset

You will be working with the COMPAS dataset, available at https://github.com/propublica/compas-analysis/blob/master/compas-scores-two-years.csv

You will be reproducing (parts of) two analyses:

Dressel, Julia, and Hany Farid. “The accuracy, fairness, and limits of predicting recidivism.” Science Advances 4.1 (2018) https://www.science.org/doi/10.1126/sciadv.aao5580

Wadsworth, Christina, Francesca Vera, and Chris Piech. “Achieving fairness through adversarial learning: an application to recidivism prediction.” In Proc. Workshop Fairness, Accountability, Transparency Mach. Learn., (2018). http://web.stanford.edu/~cpiech/bio/papers/fairnessAdversary.pdf`

Part 1 (30%)

Build a logistic regression model to predict two-year recidivism. Show that your model satisfies calibration, but fails to satisfy false-positive parity. Relate this to the difference in base rates in the dataset.

Following the paper of Corbett-Davies and Goel, show that adjusting the thresholds can lead to an algorithm that does not satisfy calibration, but does satisfy false-positive parity.

Grading scheme

  • Trains a logistic regression model with the required properties (20%)

  • Computes what’s required to check FPR parity and calibration, and discusses the results correctly (10%)

Part 2 (70%)

Wadsworth et al. claim that it is possible to produce a more accurate classifier that satisfies properties such as false-positive parity, by using more features, and by using an adversarial learning procedure.

Write PyTorch code to implement the adversarial learning procedure. Note that the adversarial learning procedure would need to be iterative: the adversary is optimized, and then the whole network N is optimized, in a loop.

Report your results. Which features did you use, and how did you use them? How did you process them? Do your experiments confirm that using Wadsworth et al.’s network with more features produces a more accurate classifier?

Do you observe that the false-positive disparity becomes small using Wadsworth et al.’s method? What did you try in order to try to observe this result?

Grading scheme

  • A good report on what was done (which features were used and how, how the training was done, etc) 20%

  • A good attempt to reproduce Wadsworth et al. (30%), with progress toward showing that FPR parity is satisfied without harming accuracy (20%)