CSC2419 (Winter 2017): Topics in Cryptography

Algorithms and Complexity in Private Data Analysis

Course Information

Instructor Aleksandar Nikolov
Office: Sandford Fleming 2301B
E-mail:
Location Bahen 2179 (BA2179)
Time Tuesday, 4-6pm
Office hours by appointment.
Reading As a reference we will use the monograph by Dwork and Roth (from Aaron Roth's website).
We will also use the recent tutorial notes by Salil Vadhan.
Grading Based on scribing one lecture (45%), one problem set (15%), and a presentation of a recent research paper (40%).

Course Description

The practical applicability of data analysis to sciences and decision-making is often limited by individual privacy concerns. Some of the most interesting data sets - medical studies, genetic data, movie reviews, search logs, social network connections - contain sensitive information about people represented in the data set. If individuals do not think their privacy concerns are addressed, they may refuse to participate in data collection, or not answer surveys truthfully. These issues limit the validity of the data analysis results, and make balancing privacy and usefulness essential to many applications of big data. In this course we will study privacy in data analysis from a rigorous theoretical perspective. We will focus on Differential Privacy: a recent approach to achieving strong provable privacy guarantees in the analysis of sensitive data. Informally, a data analysis algorithm is differentially private if changing the data of a single individual changes the output distribution of the algorithm only slightly. This guarantee ensures that the privacy risk to any individual increases only slightly by participating in data collection. Our focus will be on the design of efficient differentially private algorithms, but we will also look at some negative results that show the limits of private data analysis. In the process, we will learn about fascinating connections between differential privacy and game theory, learning theory, and geometry.

This is a theoretical, proof-based course. It will require mathematical maturity and a solid background in probability theory, and the design and analysis of algorithms.

Scribe Notes

Use this template for your scribe notes. Do not forget to fill in the lecture number, the date, and your name. The notes should read like a self-contained text and include everything that was covered in class (rather than just record what was on the board). Please contact me if anything in the lecture you are scribing was not clear. I expect the first draft of the notes the Sunday before the meeting of the next class, so that people who have missed a lecture can look over them.

Problem Set and Papers for Presentation

Problem set for homework. The problem set is due on the last day of class.

Here is a suggested list of papers. You can use the list to choose a paper to present. The presentation will be 20 minutes, with 5 minutes for questions. You do not have to use a paper from the list: it's meant as resource, not a constraint. Please email me the paper you choose to present by March 25th.

Doodle poll for a make up class: https://beta.doodle.com/poll/m9i5dkfpus8cc65k. The plan is to meet for an hour and have two presentations. The other four presentations will be on the last day of class.

Schedule of Lectures

Find a tentative schedule of lectures below. [V] denotes Vadhan's notes, and [DR] denotes Dwork and Rothblum's monograph: see links above.

  1. Attacks (Lecture notes): reconstruction and tracing attacks against privacy;
  2. Differential Privacy: definition; equivalent formulations; Randomized Response;
    Reading: (Section 1 of [V], Chapter 2 of [DR])
  3. Basic Mechanisms and Properties of DP: Laplace and Gaussian noise mechanisms; composition; group privacy;
    Reading: (Section 2 of [V], Sections 3.1-3.3 and beginning of 3.5, Appendix A of [DR])
  4. Exponential Mechanism: computing the median; synthetic data;
    Reading: (Sections 3.4 and 4.1 of [DR])
  5. Advanced Composition (Lecture notes);
    Reading: (Sections 3.5 of [DR])
  6. Learning the Database (Lecture notes): the multiplicative weight update algorithm;
    Reading: (Sections 4.2. of [DR])
  7. The Projection Mechanism (Lecture notes);
    Reading: (Section 12.4. of [DR], Sections 7.3. of [V])
  8. Projection Mechanism Analysis, Marginal Queries: (Lecture notes)
  9. Private Boosting: turning average into worst-case error guarantees;
    Reading: (Section 6.1. of [DR])
  10. Empirical Risk Minimization: adding noise to the output or to the objective;
  11. Class Presentations
  12. Class Presentations