| Faculty name: | Ryan Johnson |
|---|---|
| Research area: | Database systems |
| Campus address: | BA 5226 |
| Campus phone: | 416-946-7069 |
| Email address: |
ryan.johnson [at] cs.toronto.edu
|
| Number of students: | 1-2 |
| Skills required: |
|
Indexing is a key database technology that allows the systems to quickly find and retrieve data items with specific attributes from among millions (or even billions) of candidates. The B+Tree has long been a data structure of choice due to its efficiency at indexing both disk- and memory-resident data, and the fact that it maintains its entries in sorted order. However, most performance enhancing techniques proposed in the literature assume small integer keys, making them unsuitable for real-world data where the attribute of interest occupies more than 32 bits: large integers, strings, and groups of such values. Large, variable-length keys pose challenges for efficient retrieval because they reduce the effectiveness of CPU caches and require more work to process.
We're currently surveying the state of the art in data structures for indexing variable-length keys. By combining the best techniques and improving on them, we are developing a data structure that performs equally well for short and long keys and that matches or outperforms the best existing implementations. Your task will be to implement some of the features of these data structures and, time permitting, help to design and run tests on them. Along the way you'll become familiar with a variety of clever indexing techniques, as well as gaining valuable skills in performance tuning at the systems level.