Class resources
This page contains tutorials, assignments, datasets, and other
materials we use or develop in class.
Datasets
- Information about
1000 clinical trials carried out in 2010. The data are all
publicly available
at www.clinicaltrials.gov,
and the search engine there may help you debug your queries. See
also http://clinicaltrials.gov/ct2/info/linking.
- Reed community college course schedule (compressed XML file).
- Sudoku project files
(compressed tarball, 15kB; posted 31 Jan). A collection of python
code and saved game files for our embedded SQL project. Make sure to
also grab the reading material about Sudoku rules (on the Readings
page). Also
available: sudoku-shell.sh, which
loads a Sudoku .sav file into a stand-alone sqlite session so you
can query it interactively (1kB, posted 7 Feb).
- Students in a course (compressed sqlite3 database, 2kB). A synthetic dataset containing a hypothetical class list, with student names, contact info, and final grade. Disclaimer: All students appearing in this dataset are fictitious. Any resemblance to real students—living, dead, or undead—is purely coincidental.
- DineSafe inspection reports (compressed XML format, 1.1MB). Also available as a ready-loaded sqlite3 database. A collection of food safety inspections carried out between 2010 and 2012 in Toronto, and made available online under the Toronto DineSafe Inspection Program.
- TPC-H dataset (compressed tarball, 2.1MB; v2 uploaded 9 Jan). The TPC-H benchmark is a widely-used synthetic query workload. This is a small (1/100th scale factor) TPC-H dataset, generated by a modified version of the official TPC-H dbgen utility.
- IMDb dataset (compressed tarball, 1.5MB; v2 uploaded 9 Jan). This is a small subset of the official public version of the Internet Movie Database. IMDb authorizes use of this data only for personal and non-commercial use (we have verified that this includes classroom instruction); see the official IMDb licensing terms for details
Tutorials and exercises
- Preview of supplemental material for the final exam (posted 10 Apr).
- Practice questions for the final exam (updated 15 Apr) and solutions (updated twice on 15 Apr).
- XQuery exercises for reed.xml (posted 3 Apr) and answers (1.1, 6.1, 6.2, 2.1, 5.1, 5.2; updated 8 Apr).
- XPath exercises
and answers (posted 25
Mar, answers posted 29 Mar).
- Examples of SQL anomalies that can
arise at weak isolation levels (posted 21 Mar).
- Triggers and answers (in-class assignment, posted 18 Mar).
- SQL superhero database and answers (in-class assignment, posted 14 Mar, solutions posted 18 Mar).
- FD projection homework (in-class homework, due 13 Mar).
- FD projection problem set and answers (in-class exercise, added 7 Mar).
- Minimal basis problem set and answers (in-class exercise, updated 6 Mar).
- Superhero E/R modeling problem (in-class exercise, posted 4 Mar).
- Practice teaching E/R modeling problem and one possible (not optimal) solution (in-class exercise, updated 6 Mar).
- Join query practice problems and answers (posted 5 Feb).
- Nested query practice problems and answers (posted 4 Feb).
- Aggregation query practice problems and answers (posted 4 Feb).
- Nested query exercises Posted 30 Jan. Answers posted 2 Feb.
- Aggregation exercises Posted 24 Jan. Answers posted 28 Jan.
- Outer join exercises Posted 17 Jan. Answers posted 28 Jan.
- Inner join exercises Posted 17 Jan. Answers posted 28 Jan.
- Jigsaw Joins Posted 15 Jan.
- Basic RA/SQL exercises Posted 14 Jan. Answers posted 28 Jan.
- Setup guide for installing database engines and datasets under Cygwin. Linux and Mac should be fairly similar, except you should use your package manager of choice instead of the cygwin installer. Updated 7 Jan.
- Data cleaning assignment for the first week of class. Posted 5 Jan.