Profile photo

Farhan Samir, PhD

I'm an NSERC Postdoctoral Fellow at the University of Toronto. I'm part of the DGP Group in the Department of Computer Science. I earned my PhD at the Natural Language Processing (NLP) Group at the University of British Columbia . My research there was also supported by NSERC and UBC’s Public Scholars Initiative. I also spent some time at the University of Washington NLP group. I was also a research intern at Ai2 Aristo, NVIDIA, and Amazon’s Lab126 in each summer term of my PhD program.

The overarching theme of my research is to make information ecosystems and technologies more pluralistic, so they are inclusive of voices that have been historically marginalized. To this end, I develop quantitative measures that measure representational gaps in diverse contexts, from entrenched biases in conversational AI systems, to content gaps in large-scale information management systems, to representational gaps in news coverage. These measures are informed by studies and methods from humanistic inquiry, including critical discourse analysis, ethnographic fieldwork, and archival research, to ensure maximal ecological validity. My work is equally dedicated to a high degree of construct validity, grounded in my training in machine learning and data science.

Announcements

  • April 2026: Check out my op-ed in The Tyee, based on my ACM Websci work. More to come!
  • Feb 2026: Paper accepted to ACM Websci'26, excited to visit Germany again this summer!!
  • Mar 2025: Awarded the 2024-25 Department of Statistics Award in Data Science
  • Feb 2025: Awarded the NSERC Postdoctoral Fellowship!!
  • Jan 2025: My colleague Anjalie Field was profiled about our work on cross-linguistic information gaps in Wikipedia; check it out here

Selected Papers

Figure from paper 1
Connecting the Dots: A Longitudinal Study of Performance Disparities in Automatic Speech Recognition
Alexander Metzger, Aruna Srivastava, Ruslan Mukhamedvaleev, Eunjung Yeo, Syed Ishtiaque Ahmed, Nina Markl, Sachin Kumar, Farhan Samir
In submission
Figure from paper 1
Quantifying Media Representation Dynamics Across 25 Years of News Reporting on Policing-related Deaths
Farhan Samir, Jappun Dhillon, Meghna Ravikumar, Syed Ishtiaque Ahmed, Vered Shwartz

ACM Websci'26

Figure from paper 1
WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Edition
Zining Wang, Yuxuan Zhang, Dongwook Yoon, Nicholas Vincent, Farhan Samir, Vered Shwartz

In submission

Figure from paper 1
Locating Information Gaps and Narrative Inconsistencies Across Languages
Farhan Samir, Chan Young Park, Anjalie Field, Vered Shwartz, Yulia Tsvetkov

Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2024)