Advanced Data Systems (CSC2508), Fall 2019


Course Description

The maturity of several Deep Learning technologies has influenced the design and instigated re-thinking several design principles of data management systems and architectures. The goal of this course is two-fold. First present a review of the fundamental design components of modern data management architectures including a review of relational and NoSQL systems. Second review and explore how fundamental components can be re-designed by incorporating Deep Learning principles and techniques and explore the resulting (performance and system) implications. We will also review and investigate a few novel data management application scenarios that are uniquely enabled by merging Deep Learning and query processing technologies.

This is a graduate seminar course. There will be a combination of presentations by the instructor and the participants. All participants are expected to actively engage in the course, be familiar with all the material presented and drive the discussions for the part of the course they are responsible for. The course involves a project, more details will be available in class.

Announcements and clarifications

Administrivia

Instructor: Nick Koudas
Lectures: BA025
Office: BA 5240
TA: TBD
Office hours: by appointment
Instructor telephone: 416 946-5819
Instructor email: my last name @ uoft cs domain
Course web page: here

Course structure

At the start of every lecture, I would ask a member of the class to summarise the main topic that we will discuss. I would be interested to hear your thoughts on why is this paper important and whether there is anything you would do to challange in the methodology or thesis of this paper. This is your chance to bring up any issues you wish that demonstrate your deep understanding of the topic.

You are expected to actively participate in the discussions for each lecture and be fully familiar with the papers presented. For each paper you are assigned to present you are expected to do all the background research and collect all suitable references. You will share you slide deck with the class and make it available through the course shared folder along with all references you used.

The class folder with access to reading material (and presentations as become available) is here

Readings

Review of relational technology (9/9)

Overview of noSQL (9/16)

Indexing (9/23)

Query Optimization (9/30)

Selectivity Estimation (10/7)

Data Exploration (10/21)

Entity Resolution (10/28)

Entity Resolution Optimizations (11/11)

Data Managament for Video Streams (11/18)

RDBMS for Machine Learning (11/25)

Project Presentations(12/2)

Other Resources

Breakdown of marks

The course mark will be broken down into the categories listed below, with points assigned as indicated:

WeightItemMinimal markModerate markHigh mark
30%ParticipationPresentTalkativeInsightful comments or questions
20%PresentationsFactually correctDesigned and delivered wellTransmits effectively key points, implications, etc.
5%Quality of feedback to peersFocus on nitpicks and minutiaeSuggest incremental improvementsIdentify structural strengths and flaws
45%Final projectUnambitious and/or badly plannedPartially implemented and/or poorly presentedImplemented successfully with key learning points presented

Project proposals

The course is associated with a project. Proposed class projects will be described by the instructor. Feel free to discuss your ideas with the instructor and propose your own project. However the project you propose HAS to be associated with the material in the class. This is very important and it is not up for discussion. The project should have a research component. Project ideas will be outlined in class but you are responsible for proposing your project. Some background reading is associated with each project. The project proposal (due date Oct 21) should contain the following information: Project proposals should be a couple of pages at most. A project status report is due on Nov 11. The status report should include a description of progress to date and what is expected to be accomplished by the final project presentation day.