CSC2552 Topics in Computational Social Science: AI, Data, and Society

Fall 2023

Lecture hours: Thu 3-5pm
Lecture link: https://utoronto.zoom.us/j/81812733460 (Passcode available on Quercus)

Instructor: Ashton Anderson, Assistant Professor
Email: ashton [at] cs [dot] toronto [dot] edu
Office location: PT 398C
Office hours: by appointment

Reviews TA: Suhyeon (Sue) Yoo, PhD student
Email: suhyeon [dot] yoo [at] mail [dot] utoronto [dot] ca

Assignment/Projects TA: Harsh Kumar, PhD student
Email: harsh [at] cs [dot] toronto [dot] edu

Important links:

Course News

Course Description

The mass migration of life onto online platforms has ushered in a new era of social research. The recent availability of large-scale datasets of human behaviour and the new potential for massive online experimention introduce both great opportunities and challenges. In this class, a seminar course on computational social science, we will survey the explosion of research at the intersection of computer science and the social sciences.

The goals of this seminar class are twofold: first, students will be acquainted with computational tools and techniques for processing and thinking about social data, and be exposed to papers spanning the full spectrum of computational social science methods from large-scale empirical data analysis to online experimetation. Second, students will develop research skills by reading, reviewing, presenting, and discussing recent academic papers.

Every week, we will cover a different computational social science topic by reading two papers and discussing them. Before class, everyone will write a review of the papers, identifying their research questions, strengths and weaknesses, and connection to other literature. Each paper will be assigned to 2-3 people who will lead a group discussion of it in class. Throughout the term, everyone will get the chance to present one paper.

The major coursework component of the course, besides the weekly reviews, will be a term project. Students will propose a topic, give a presentation on their work, and submit a final report. The project will give students a chance to identify an interesting computational social science problem and implement it, and could potentially lead to publication in a workshop or conference.

Grading scheme: Textbook

We will read the book Bit By Bit: Social Research in the Digital Age by Matthew Salganik. It's available online for free, and in print form at a reasonable cost.

Schedule

The following is a tentative schedule of topics we'll cover (subject to change).

Week Date Topic Reviews Due Textbook Readings
1 9/7 Introduction to computational social science [Slides] [Video] (2nd half) Ch. 1
2 9/14 Introduction to computational social science cont'd [Slides] [Video] Ch. 1
3 9/21 Observational studies 1 [Video] 9/20 9:00pm Ch. 2
4 9/28 Observational studies 2 9/27 9:00pm Ch. 2
5 10/5 Experiments 1 10/4 9:00pm Ch. 4
6 10/12 Project proposals
7 10/19 Experiments 2 10/18 9:00pm Ch. 4
8 10/26 Asking questions 10/25 9:00pm Ch. 3
9 11/2 Applying machine learning 11/1 9:00pm
10 11/16 Ethics in computational social science 11/15 9:00pm Ch. 6
11 11/23 Project presentations (Part 1)
12 11/30 Project presentations (Part 2)


Classes

 

9/21: Observational studies 1

Main papers:

  1. Asymmetric ideological segregation in exposure to political news on Facebook (and online supplement).
    S. González-Bailón et al.
    Science, 2023.
  2. The Diversity–Innovation Paradox in Science (and online supplement).
    B. Hofstra et al.
    Proceedings of the National Academy of Sciences (PNAS), 2020.

Optional background material:

9/28: Observational studies 2

Main papers:

  1. Dissecting racial bias in an algorithm used to manage the health of populations (and online supplement).
    Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan.
    Science, 2019.
  2. Human Decisions and Machine Predictions (and online supplement).
    J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, S. Mullainathan.
    Quarterly Journal of Economics (QJE), 2017.

Optional background material:

10/5: Experiments 1

Main papers:

  1. Shifting attention to accuracy can reduce misinformation online (and online supplement).
    G. Pennycook, Z. Epstein, M. Mosleh, A.A. Arechar, D. Eckles, D.G. Rand.
    Nature, 2021.
  2. Algorithm aversion: People erroneously avoid algorithms after seeing them err.
    B. Dietvorst, J. Simmons, C. Massey.
    Journal of Experimental Psychology, 2014.

10/12: Project proposals

Students will present their project proposals.


10/19: Experiments 2

Main papers:

  1. The Welfare Effects of Social Media.
    H. Allcott, L. Braghieri, S. Eichmeyer, M. Gentzkow.
    American Economic Review, 2020.
  2. Megastudies improve the impact of applied behavioural science (and online supplement).
    K.L. Milkman et al.
    Nature, 2021.

10/26: Asking questions

Main papers:

  1. Machine learning and phone data can improve targeting of humanitarian aid (and online supplement).
    E. Aiken, S. Bellue, D. Karlan, C. Udry, J.E. Blumenstock.
    Nature, 2022.
  2. Users choose to engage with more partisan news than they are exposed to on Google Search (and online supplement).
    R. E. Robertson, J. Green, D. J. Ruck, K. Ognyanova, C. Wilson, D. Lazer
    Nature, 2023.

11/2: Applying machine learning

Main papers:

  1. Can Large Language Models Transform Computational Social Science?
    C. Ziems, W. Held, O. Shaikh, J. Chen, Z. Zhang, D. Yang.
    Preprint, 2023.
  2. Trucks Don’t Mean Trump: Diagnosing Human Error in Image Analysis.
    J.D. Zamfirescu-Pereira, J. Chen, E. Wen, A. Koenecke, N. Garg, E. Pierson.
    ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2022.

11/16: Ethics in computational social science

Main papers:

  1. Gender shades: Intersectional accuracy disparities in commercial gender classification (and online supplement).
    J. Buolamwini and T. Gebru.
    ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2018.
  2. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale.
    F. Bianchi et al.
    ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2023.

11/23 and 11/30: Final project presentations

Students will present their final projects.




Writing Reviews

The main point of this class is to engage with important and cutting-edge research at the interface of computer science and the social sciences. This involves reading, reviewing, discussing, and presenting papers. Reviewing papers for CSC2552 will help you develop your reviewing and critical thinking skills, as well as prepare you for in-class discussions. In what follows, I've written some thoughts on how to write a good review. Every week that we have a discussion class, your paper reviews will be due on Wednesday at 9pm before class.

Breaking papers down. Our structure for thinking about papers in this class can be roughly categorized into three components: the "front matter", the "meat", and the "back matter". The front matter consists of high-level motivation ("Why are we studying this domain and these questions? Why should we care about this? What will we gain if we successfully accomplish this paper's goals?"), the research question [RQ] ("What is the high-level research question that motivates this paper?"), and the concrete operationalization [CO] ("What is the answerable question that the researchers actually answer?"). Notice the difference between the RQ and the CO: the RQ is the high-level question, typically too broad and abstract to be completely answered in a single paper, whereas the CO is the question the researchers actually answer, and is usually the RQ where several high-level constructs are instantiated ("operationalized"). For example, in the structural virality paper I presented in the first lecture, the motivating research question is: "How does information spread in the world?" and the concrete operationalization is: "How structurally viral are diffusion cascades on Twitter?". In this CO, we've operationalized "information" as "URLs", "the world" as "Twitter", and "spread" as our new metric of "structural virality". Usually, the "front matter" of a paper is written in the front, in the Introduction (and potentially elsewhere, such as the Methods).

The "meat" of the paper is the analysis, where the actual work has been reported (e.g. the observational analysis, the results of the experiment, the survey findings, etc.). You'll typically find this in the middle of the paper and will normally be the majority of the content. Common section headers are Results, Analysis, etc.

Finally, the back matter of the paper interprets the paper's findings and discusses their implications. The authors should explicitly discuss how their results answer the research question, and what their results mean for it. They will also usually discuss the paper's limitations and suggest directions for follow-up work.

Review structure. The structure of your review should mirror this 3-part paper structure. First, provide a concise (1-2 sentence) summary of the paper. What is the main point of the paper? This demonstrates that you understood the high-level point of the work, and it's often a useful exercise to circumscribe the domain the paper is exploring. It's important to ensure that your summary is brief — if you can't summarize the point concisely, your understanding of the work probably still isn't crisp enough. Then, explicitly state both the motivating research question that the authors are asking and the concrete operationalization that the authors use in the work. This will comprise the "front matter".

Next, describe the methodological approach that the authors take to answering their CO, and discuss the strengths and weaknesses of this approach. For example, how did the author's choice of conducting a particular observational analysis play out? These should be major pros and cons, not little nitpicks. It's extremely rare that an approach doesn't have both several strengths and several weaknesses. Research is difficult, and there are almost always tradeoffs. What are the compromises the authors made? What is good about their approach, and what are the limitations? How might one address these limitations?

Finally, discuss the back matter of the work and your thoughts on the paper's contribution. What exactly have the authors shown? Do they answer their research question and address their motivation? Do you agree with their interpretation of the results? What is the difference between the world after this paper and the world without it? Do the strengths of their approach justify the weaknesses? Should the authors have done anything differently, in your opinion? What else should they have done? What are the implications of the results? How does this research inform or compliment other work in computational social science, or society in general?

Throughout your review, discuss the computational social science methods used in the framework set out by Salganik in Bit By Bit.

Finally, reviews should be concise. Papers are "big" things; they represent an entire research project that a group of people have spent significant time pursuing. Resist the temptation to address every detail, or go off on a small tangent. Keep to the main and most important points. Reviews should not be longer than 500 words.

The grading rubric for the reviews is as follows:


Leading discussion

Everyone will have the opportunity to lead a class discussion of a paper. Here is some advice on leading the discussion.

Context. The goal of a discussion is for the class to have an in-depth conversation about the paper in order to (1) fully understand all three components of the paper: front matter, meat, back matter, and (2) discuss and discover the tradeoffs associated with the methodology being covered that week. The ideal conversation is a kind of emergent, collective "review" of the paper being co-written in real time by the entire class. Your objective is to design a discussion-leading strategy that sparks this ideal conversation. Remember that everyone in attendance will not only have read the paper but will have written a full review of it. Everyone has their own parts of the collective review to contribute, and you just need to elicit them.

Structure. The class discussion will last approximately 50 minutes, and should mirror the "breaking down papers" structure on the website. First discuss the front matter, then the meat of the paper, then the back matter. Don't recapitulate the paper; this isn't a presentation or a talk. Instead, indicate which part of the paper you want the discussion to centre on and perhaps briefly refresh the audience's memory on the paper contents if need be. Your main contributions will be carefully thought-through, non-obvious questions and prompts that challenge the audience and really get at the heart of things. The laziest way to lead the discussion is simply to go through the structure and ask "What do you think about X?" where X is the motivation, RQ, CO, strengths, weaknesses, interpretations, etc. Please don't do this; instead, go beyond the most obvious questions and ask more incisive ones instead. Great discussion leaders in the past have gone the extra mile by bringing in relevant related work, engaging the audience with the motivation, and even emailing the authors to request missing information.

I will jump in and act as another discussion leader when necessary, but will otherwise let you do the leading. Important things to remember are: You'll be graded on your organization, the depth and coverage of your questions, and the quality of the resulting conversation.

Projects

One of the main goals of CSC2552 is to introduce you to research in computational social science. The best way to do that is to get your hands dirty and try to study something yourself, and the final project offers you an opportunity to do exactly this.

The project has two main deliverables: the project proposal and the final report. Students will present each to the class (proposals on October 13 and final reports on November 24 and December 1). Projects will be done in teams of one or two people.

Your first task is to pick a project topic. If you are looking for project ideas, please email me, and I'd be happy to brainstorm and suggest some project ideas. Also check this resources page.

Project proposal (2 pages). Your proposal should outline the motivation for your project, the realistic research question you wish to answer, and your chosen operationalization. Articulate these as crisply as you can. Next, survey a bit of related work (around 1 paragraph). The proposal should identify the CSS methodology you will use and lay out a plan for your project. How exactly do you plan to pursue your research questions? What data/methods will you use? You should provide a concrete proposal for a data analysis, experiment, survey, or other method that helps you answer your research question. The analysis plan will probably be the biggest section, at least 3/4 page. Finally, mention any of the steps you have taken so far and share your preliminary progress. Ideally, the first 2 pages of your final report will be quite similar to this proposal.

Proposal components:



Acknowledgments

Thanks to Sharad Goel, Jon Kleinberg, Jure Leskovec, Matt Salganik, Johan Ugander, and Bob West for course advice and inspiration.