CSC2508 Advanced Data Systems

Vector Databases and Unstructured Data Query Processing

Location

AB 114

Time

Mondays 9-11am

Instructor

Nick Koudas

Midterm Exam

40%

of final grade

Quizzes (2)

20%

10% each

Research Project

40%

Proposal due November 3 2025

Final report due end of year

Course Syllabus

This course explores advanced topics in data systems with a focus on vector databases and unstructured data query processing. Students will learn about information retrieval, embeddings, different types of indexing techniques in high dimensions, and modern approaches to multimodel query processing. We will also cover recent techniques for querying unstructured data and touch upon numerous research directions in this area.

Lecture Title Readings & Materials
1 Introduction / Fundamentals of Information Retrieval Stanford IR Book (Chapters 1-4)
2 Information Retrieval and Ranking Stanford IR Book (Chapter 6, 11,12)
3 Introduction to Embeddings Chapter 5 (Embeddings) Glove: Global Vectors for Word Representation Efficient Estimation of Word Representations in Vector Space
4 Contextual Embeddings Chapter 8, 10 (up to 10.4), suggested Ch 7
5 Vector indexing: Basics Chapter 10 (10.3 onwards) Rtrees
6 Indexing in High Dimensions Until section 3.8 Similarity Search in High Dimensions via Hashing
7 Quantized / Table Based Indexing
Reading materials to be added
8 Graph Indexing
Reading materials to be added
9 Graph Indexing (Continued)
Reading materials to be added
10 Filtered Search
Reading materials to be added
11 Document Query Processing
Reading materials to be added
12 Document Query Processing (Continued)
Reading materials to be added

Project Information

The research project is a significant component of this course. Students are encouraged to propose their own projects related to vector databases and unstructured data processing. Students should form teams of up to 2 people (groups of size 1 are fine as well). The instructor will also provide suggested topics and will highlight open problems during the lectures.

CSC2508 Project Proposal Instructions: For the project proposal, you are required to submit a 2-3 page document outlining your planned research project. This proposal should be structured like a formal academic research proposal and must clearly define a project that is directly related to the core topics of CSC2508. Your document must articulate the core research question, the specific objectives of your study, and the experimental component you will use to explore the course concepts. It should include a brief review of relevant background literature to situate your work, a detailed description of your proposed methodology (including datasets, algorithms, or systems you plan to use and how you will evaluate your results), and a discussion of your expected outcomes. The goal of the proposal is to demonstrate a well-scoped, feasible, and intellectually meritous plan for your term-long project, which will serve as your roadmap for the rest of the semester.

CSC2508 Final Project Report Instructions: The final project report is a comprehensive account of your completed work, structured as a 10-page research paper following the standard ACM Conference Proceedings format. This report must go beyond the initial proposal to include a full introduction, detailed background, a complete explanation of your methodology, a thorough presentation and analysis of your experimental results, and a discussion of their significance. You must connect your findings back to the core concepts of the course and the related work you cited. The report should tell the complete story of your project: what you set out to do, what you actually did, what you discovered, and what it all means, including an honest reflection on any limitations and potential avenues for future work. Adherence to the ACM formatting guidelines and page limit is mandatory.

Important Dates: