Notes on Racial bias in US police stops

Posted on January 26, 2022

A large-scale analysis of racial disparities in police stops across the United States

Pierson, E., Simoiu, C., Overgoor, J., Corbett-Davies, S., Jenson, D., Shoemaker, A., Ramachandran, V., Barghouty, P., Phillips, C., Shroff, R., and Goel, S. (2020). A large-scale analysis of racial disparities in police stops across the United States. Nature Human Behavior.

Posted on January 26, 2022

The paper aims to prove the presence of bias in police stops and search decisions in the United States. Furthermore, it attempts to show how certain policies—namely legalization of marijuana—affects such scenarios.

The motivating research questions are: “is there racial bias when it comes to police enforcement on the road?”; “does this change after certain policies are enforced?”

The authors were able to recognize that the easiest dataset available was probably non-representative, and decided to file requests nation-wide of public records in order to have a bigger representative sample. Even more, they provide dataset and code for transparency and reproducibility. Nonetheless, there are certain characteristics with such a dataset that need to be discussed. For instance, the data is incomplete as the records are not thoroughly completed by the police officers. Also, some data points could have been recorded incorrectly which would make it a dirty dataset.

A main weakness I find in the paper is that there is no mention of the ratio per race driving before and after the dusk. In the case where black people just don’t drive as much during the night than during the day, it could explain the drop of stops. Also, the authors mention that looking at the per-capita data doesn’t make much sense—as other factors might be affecting the ratio—, but then don’t provide a solution for that problem. Also, I find that there are too many confounding variables that can affect the reason for stopping by police, which subtract on the soundness of their results, and harness their interpretations of the data.

Lastly, even though I agree with the thesis and understand the importance of working on these problems, I think this paper in particular is not very robust, which could be the reason why the authors were in the need of providing many results with confidence intervals and p-values in text.