AB 114
Mondays 9-11am
Nick Koudas
of final grade
10% each
Proposal due November 3 2025
Final report due end of year
This course explores advanced topics in data systems with a focus on vector databases and unstructured data query processing. Students will learn about information retrieval, embeddings, different types of indexing techniques in high dimensions, and modern approaches to multimodel query processing. We will also cover recent techniques for querying unstructured data and touch upon numerous research directions in this area.
Lecture | Title | Readings & Materials |
---|---|---|
1 | Introduction / Fundamentals of Information Retrieval | Stanford IR Book (Chapters 1-4) |
2 | Information Retrieval and Ranking | Stanford IR Book (Chapter 6, 11,12) |
3 | Introduction to Embeddings | Chapter 5 (Embeddings) Glove: Global Vectors for Word Representation Efficient Estimation of Word Representations in Vector Space |
4 | Contextual Embeddings | Chapter 8, 10 (up to 10.4), suggested Ch 7 |
5 | Vector indexing: Basics | Chapter 10 (10.3 onwards) Rtrees |
6 | Indexing in High Dimensions | Until section 3.8 Similarity Search in High Dimensions via Hashing |
7 | Quantized / Table Based Indexing |
Reading materials to be added
|
8 | Graph Indexing |
Reading materials to be added
|
9 | Graph Indexing (Continued) |
Reading materials to be added
|
10 | Filtered Search |
Reading materials to be added
|
11 | Document Query Processing |
Reading materials to be added
|
12 | Document Query Processing (Continued) |
Reading materials to be added
|
The research project is a significant component of this course. Students are encouraged to propose their own projects related to vector databases and unstructured data processing. Students should form teams of up to 2 people (groups of size 1 are fine as well). The instructor will also provide suggested topics and will highlight open problems during the lectures.
CSC2508 Project Proposal Instructions: For the project proposal, you are required to submit a 2-3 page document outlining your planned research project. This proposal should be structured like a formal academic research proposal and must clearly define a project that is directly related to the core topics of CSC2508. Your document must articulate the core research question, the specific objectives of your study, and the experimental component you will use to explore the course concepts. It should include a brief review of relevant background literature to situate your work, a detailed description of your proposed methodology (including datasets, algorithms, or systems you plan to use and how you will evaluate your results), and a discussion of your expected outcomes. The goal of the proposal is to demonstrate a well-scoped, feasible, and intellectually meritous plan for your term-long project, which will serve as your roadmap for the rest of the semester.
CSC2508 Final Project Report Instructions: The final project report is a comprehensive account of your completed work, structured as a 10-page research paper following the standard ACM Conference Proceedings format. This report must go beyond the initial proposal to include a full introduction, detailed background, a complete explanation of your methodology, a thorough presentation and analysis of your experimental results, and a discussion of their significance. You must connect your findings back to the core concepts of the course and the related work you cited. The report should tell the complete story of your project: what you set out to do, what you actually did, what you discovered, and what it all means, including an honest reflection on any limitations and potential avenues for future work. Adherence to the ACM formatting guidelines and page limit is mandatory.