CSC2412 Algorithms for Private Data Analysis (Fall 2019)

Course Information

Instructor Aleksandar Nikolov
Office: Sandford Fleming 2301B
Location Bissel 112 (BL 112)
Time Thursday, 2-4pm
Office hours by appointment.
Reading As a reference we will use the monograph by Dwork and Roth (from Aaron Roth's website).
We will also use the tutorial notes by Salil Vadhan.
There will be lecture notes posted for some of the lectures.
Grading Based on two problem sets (20%), a written project (50%), and a presentation (30%).
Discussion Forum Feel free to sign up for and use Piazza.

Course Description

The practical applicability of data analysis to sciences and decision-making is often limited by individual privacy concerns. Some of the most interesting data sets - medical studies, genetic data, movie reviews, search logs, social network connections - contain sensitive information about people represented in the data set. If individuals do not think their privacy concerns are addressed, they may refuse to participate in data collection, or not answer surveys truthfully. These issues limit the validity of the data analysis results, and make balancing privacy and usefulness essential to many applications of big data.

In this course we will study privacy in data analysis from a rigorous theoretical perspective. We will focus on Differential Privacy: a recent approach to achieving strong provable privacy guarantees in the analysis of sensitive data. Informally, a data analysis algorithm is differentially private if changing the data of a single individual changes the output distribution of the algorithm only slightly. This guarantee ensures that the privacy risk to any individual increases only slightly by participating in data collection. Our focus will be on the design of efficient differentially private algorithms. In the process, we will learn about fascinating connections between differential privacy and machine learning, geometry, and game theory.

While we focus on the algorithms, there will be proofs. It is important to prove formally that an algorithm provides privacy: this is not something that can be verified experimentally. Most proofs are simple, but the course does require mathematical maturity and a background in probability theory, and the design and analysis of algorithms.

Problem Sets

Problem Set 1 is posted. It is due in three weeks, on October 25, by 11:59pm. You should submit via MarkUs at See the problem set file for more detailed instructions.

Problem Set Solutions

Problem Set 2 is posted. It is due in two weeks, on November 29, by 11:59pm. You should submit via MarkUs at See the problem set file for more detailed instructions.

Class Projects

General Information

At the end of the semester you need to submit a written project report and present the project in front of the class. Projects can be done in groups of two or individually. I strongly encourage you to work in groups of two. This way you get to know other students, you get to collaborate on research projects, and you can aim for a more ambitious project.

Your project should fall in one of the following categories:

Here are some suggested project ideas, as examples.

Project Proposal

You should submit a Project Proposal by 11:59pm on Friday, October 25. The proposal should be one page. It is ok if it is a little longer but I will stop reading after the second page. It should describe the question you are investigating and your goals for the project. You should propose a range of goals, from ambitious ones to ones that you are certain you can achieve. The proposal should also briefly describe existing work related to your question and goals; you can limit yourself to the one or two most relevant papers. Aim for one paragraph to describe the question, another for the research goals, and a third for the related work.

Proposals should be submitted via MarkUs ( by 11:59pm on October 25. They will not be graded: the goal is to give you feedback before you start working on your project. If you are working in a group, identify both members of the group in the proposal, and submit as a group in MarkUs. (Let me know if you do not know how to do that.)

Here are some suggested project ideas. These are not project proposals: before submitting one of them, you want to flesh them out by specifying concrete goals and giving a more careful comparison with one or two most relevant papers. These project ideas can be claimed on a first-come, first-served basis, so email me as soon as you decide that you want to pursue one of them.

You can take a look at the accepted papers in the upcoming workshop TPDP 2019, differential privacy papers from ICML, and presentations from the five workshops during the Simons Institute Semester. You can also take a look at the workshops TPDP 2018 and Banff, the privacy sessions of CCS 2018, and the publications list of the Harvard Privacy Tools Project.

I have created a Piazza discussion board for the class. Feel free to use it to ask questions or to find a partner for the project.

Project Presentation

The presentations for your projects will be on December 5 (the last day of class), at the usual hour, and in the usual meeting place. Prepare an 8 minute (do not go over time!) presentation that describes briefly what you are studying, and any (partial) results you have already achieved. It is ok to indicate that something is still in progress. In case your project is a survey, your presentation should point out some key ideas in the work you are summarizing, and present an open problem, and describe why it is important or interesting. The main thing I will be looking for in the presentations will be clarity.

I will bring cookies.

Project Report

Your written project reports are due on December 20. Aim for 5-7 typed pages, in A4 format with reasonable font and margins (so, for example, 10-11pt font and 1 inch margins all aroound). You can submit your report by emailing it directly to me.

Your report should clearly describe the problem, or set of problems, that is addressed by your project. Then summarize the most relevant related work, including a brief summary of the main results. Then describe your findings and end with a discussion of directions for future research.

Of course, reports will be different for different types of projects:

Schedule of Lectures

Find a tentative schedule of lectures below. [V] denotes Vadhan's notes, and [DR] denotes Dwork and Roth's monograph: see links above.

  1. Attacks: reconstruction and tracing attacks against privacy;
    Reading: Lecture notes.
  2. Differential Privacy: definition; equivalent formulations; Randomized Response;
    Reading: Section 1 of [V], Chapter 2 of [DR]
  3. Basic Mechanisms and Properties of DP: Laplace and Gaussian noise mechanisms; composition; group privacy;
    Reading: Section 2 of [V], Sections 3.1-3.3 and beginning of 3.5, Appendix A of [DR]
  4. Exponential Mechanism: private PAC learning;
    Reading: Sections 3.4 and 11.1 of [DR], Section 8.1. of [V]
  5. Empirical Risk Minimization: Private Gradient Descent;
    Reading: Lecture notes.
  6. Privacy and Adaptive Data Analysis: why private data analysis on the sample generalizes to the population;
    Reading: Lecture notes.
  7. Learning the Database: the multiplicative weight update algorithm;
    Reading: Sections 4.2. of [DR], and Lecture notes
  8. The Projection Mechanism;
    Reading: Section 12.4. of [DR], Sections 7.3. of [V], and Lecture notes
  9. Projection Mechanism Analysis, Marginal Queries
    Reading: Lecture notes
  10. The Sparse Vector Technique: answering queries online;
    Reading: Section 3.6. of [DR]
  11. Class Presentations.