DCS Summer 2012 Research Awards -- Project Description

One tree to rule them all: A cache-friendly index for long and variable-length keys

Faculty name: Ryan Johnson
Research area: Database systems
Campus address: BA 5226
Campus phone: 416-946-7069
Email address: ryan.johnson [at] cs.toronto.edu
Number of students: 1-2
Skills required:
  • C/C++
  • data structures
  • performance tuning and debugging

Brief project description:

Indexing is a key database technology that allows the systems to quickly find and retrieve data items with specific attributes from among millions (or even billions) of candidates. The B+Tree has long been a data structure of choice due to its efficiency at indexing both disk- and memory-resident data, and the fact that it maintains its entries in sorted order. However, most performance enhancing techniques proposed in the literature assume small integer keys, making them unsuitable for real-world data where the attribute of interest occupies more than 32 bits: large integers, strings, and groups of such values. Large, variable-length keys pose challenges for efficient retrieval because they reduce the effectiveness of CPU caches and require more work to process.

We're currently surveying the state of the art in data structures for indexing variable-length keys. By combining the best techniques and improving on them, we are developing a data structure that performs equally well for short and long keys and that matches or outperforms the best existing implementations. Your task will be to implement some of the features of these data structures and, time permitting, help to design and run tests on them. Along the way you'll become familiar with a variety of clever indexing techniques, as well as gaining valuable skills in performance tuning at the systems level.

Back to the index.