Optimal binary search trees

Next: November 22 Up: November 20 Previous: November 20

Optimal binary search trees

Suppose you have data organized into a BST by keys. It seems wise to keep the most frequently accessed keys near the root, and the rarely accessed keys further down. In fact, you can quantify this wisdom by decreeing that it takes one step to get to the root, two steps to get to its children, etc., and (if you know the frequency, say of key ) you can calculate the cost of accessing as $f_i\times K_i$ . If you sum this up over all $f_i\times K_i$ , you get the weighted internal path length of your BST. Inspection shows that this may be different for two BSTs with identical keys, so structure matters.

Suppose you're given an ordered list of keys, $K_1 < K_2, \ldots, K_n$ , and each key has an associated frequency $f_1, \ldots, f_n$ . We want to build a BST that minimizes the work we expect to do accessing each key. Suppose we know that a BST with a minimal weighted internal path has root , then it follows that its left subtree (with keys $K_1, \ldots, K_{j-1}$ ) and its right subtree (with keys $K_{j+1}, \ldots, K_n$ ) must also have minimal weighted internal path length. This leads to the recurrence, where represents the lowest possible cost of a BST with keys $K_i, \ldots, K_k$ , and is the cost added by making the node with key the root for subtrees $K_i, \ldots, K_{j-1}$ and $K_{j+1}, \ldots, K_k$ .

$\displaystyle C[i][k] = \begin{cases} 0 & k < i \\ f_i & k == i \\ \min{i\leq j \leq k} \{C[i][j-1] + C[j+1][k] + w(i,j,k)\} & \text{otherwise} \end{cases}$

Notice that when we make the node with key

the root of subtrees with nodes $K_i, \ldots, K_{j-1}$ and $K_{j+1}, \ldots, K_k$ , we increase all the paths to nodes in the subtrees by 1, so we add $f_i + \cdots + f_{j-1}$ and $f_{j+1} + \cdots + f_k$ to our cost, plus the cost (

) of getting to the new root note. That gives us a formula for

(which, curiously, doesn't depend on

You should be able to modify the code for matrix multiplication to find an optimal BST, using a triply-nested loop (exercise). Some care is needed in manipulating the indices. This problem is magnified in Assignment 4, where we re-tool the indices in the pursuit of generic code.

Next: November 22 Up: November 20 Previous: November 20

Danny Heap 2002-12-16