Suppose you have data organized into a BST by keys. It seems wise to
keep the most frequently accessed keys near the root, and the rarely
accessed keys further down. In fact, you can quantify this wisdom by
decreeing that it takes one step to get to the root, two steps to get
to its children, etc., and (if you know the frequency, say of
key
) you can calculate the cost of accessing
as
. If you sum this up over all
, you get the
weighted internal path length of your BST. Inspection shows
that this may be different for two BSTs with identical keys, so
structure matters.
Suppose you're given an ordered list of keys,
, and each key has an associated frequency
. We
want to build a BST that minimizes the work we expect to do accessing
each key. Suppose we know that a BST with a minimal weighted internal
path has root
, then it follows that its left subtree (with keys
) and its right subtree (with keys
) must also have minimal weighted internal path length.
This leads to the recurrence, where
represents the lowest
possible cost of a BST with keys
, and
is
the cost added by making the node with key
the root for subtrees
and
.
You should be able to modify the code for matrix multiplication to find an optimal BST, using a triply-nested loop (exercise). Some care is needed in manipulating the indices. This problem is magnified in Assignment 4, where we re-tool the indices in the pursuit of generic code.