CSC2508 Advanced Data Systems

Vector Databases and Unstructured Data Query Processing

Location

AB 114

Time

Mondays 9-11am

Instructor

Nick Koudas

Midterm Exam

40%

of final grade

Quizzes (2)

20%

10% each

Research Project

40%

Proposal due November 3 2025

Final report due end of year

Course Syllabus

This course explores advanced topics in data systems with a focus on vector databases and unstructured data query processing. Students will learn about information retrieval, embeddings, different types of indexing techniques in high dimensions, and modern approaches to multimodel query processing. We will also cover recent techniques for querying unstructured data and touch upon numerous research directions in this area.

Lecture Title Readings & Materials
1 Introduction / Fundamentals of Information Retrieval Stanford IR Book (Chapters 1-4)
2 Information Retrieval and Ranking Stanford IR Book (Chapter 6, 11,12)
3 Introduction to Embeddings Chapter 5 (Embeddings) Glove: Global Vectors for Word Representation Efficient Estimation of Word Representations in Vector Space
4 BERT / Transformers
Reading materials to be added
5 Multimodal Embeddings
Reading materials to be added
6 Indexing in High Dimensions
Reading materials to be added
7 Tree / Table Based Indexing
Reading materials to be added
8 Graph Indexing
Reading materials to be added
9 Graph Indexing (Continued)
Reading materials to be added
10 Filtered Search
Reading materials to be added
11 Document Query Processing
Reading materials to be added
12 Document Query Processing (Continued)
Reading materials to be added

Project Information

The research project is a significant component of this course. Students are encouraged to propose their own projects related to vector databases and unstructured data processing. The instructor will also provide suggested topics and will highlight open problems during the lectures.

Important Dates: