s========================================================================== CSC 263H Lecture Outline for Week 3 Winter 2004 =========================================================================== [[Q: denotes a question that you should think about and that will be answered during lecture.]] Note: Some of the first material was already covered in UTM and UTSc sections of the course. It is included here for St. George students who will cover it in week 3. Randomized QuickSort: - Average-case analysis above works only if each permutation equally likely. In practice, this may not be true. - Instead of relying on unknown distribution of inputs, randomize algorithm by picking random element as pivot. This way, random behaviour of algorithm on any fixed input is equivalent to fixed behaviour of algorithm on a uniformly random input, i.e., expected worst-case time of randomized algorithm is Theta(n log n). - In general, randomized algorithms are good when there are many good choices but it is difficult to find one choice that is guaranteed to be good. ------------ Dictionaries [sections 2.5.1 and 3.1] ------------ Dictionary ADT (slightly simpler version than textbook): Objects: Sets S, each of whose elements x has a "key" field key[x] that must come from a totally ordered universe (e.g., key[x] could be integer, character, etc.). Not all keys need to be distinct. Operations: ISEMPTY(S): check whether S is empty or not SEARCH(S,k): return x in S s.t. key[x] = k, or NIL if no such x INSERT(S,x): insert x in S DELETE(S,x): remove x from S (given _element_ x, not just its key) [[Q: why not DELETE(S,k)?]] Many variations possible: - require unique keys / allow duplicate keys, - add operations to change information associated with a specific key, - etc. Data structures? [[Think of possibilities, from simple to complex.]] ------------------- Binary Search Trees [section 3.1] ------------------- For definition and implementation of operations SEARCH, INSERT, DELETE, see section 3.1 of textbook (since this is covered in CSC148H/A58H). Worst-case running time for all three operations is O(height of tree), which ranges between Theta(log n) and Theta(n). Average-case performance of BSTs: [NOT IN TEXTBOOK] - Performance depends completely on height of tree, so analyse average height of a "random BST". - Definition of "random BST" is tricky: there are many possibilities, most of which are very hard to analyse. In fact, if we allow trees built by arbitrary sequences of "INSERT" and "DELETE" operations, the "average" height is not known. - To simplify analysis, consider randomly-build trees: start from empty tree and insert values from {1,2,...,n} in random order (where each of the n! possible orderings are equally likely). [[Think about what happens and try to figure out answer. Hint: idea similar to average-case analysis of QuickSort...]] ----------- 2-3-4 Trees [section 3.3.2] ----------- A 2-3-4 tree (also called a (2,4) tree) is similar to the BST in that it stores values in the internal nodes and has a property relating the values stored in a subtree to the values in the parent node. It is different because an internal node can have 2, 3 or 4 children. (This is the "size property".) Here is an example of a 2-3-4 tree. ______17_____ / \ __4____9_ 20__30__41__ / | \ / | | \ 1_2_3 7_8 12 18 24 33 56_80 Consider the node (4,9): it contains two values and has three children (the nodes (1,2,3), (7,8), and (12)). [[Q: What are the possible number of values a 2-3-4 tree internal node may contain? ]] Notice the property relating the values in a subtree to the values in the parent node of it's root. This is formally defined in section 3.3.1 (Multi-way Search Trees) in the text but is quite simple to see informally. - Consider the subtree rooted at node (20,30,41). All values in this subtree must be greater than 17. Now consider the subtree rooted at node (7,8). All values in this subtree must be greater than 4 and less than 9. [[Q: What range of values would be allowed in any subtree rooted at node (33)?]] [[Q: If we were to insert the value 15 into the tree above, where would it need to go to preserve the order property?]] - Formally, we introduce some notation. A node with d children is called a d-node; the values stored at the node are labelled k_1, k_2, ..., k_{d-1} and the children are labelled v_1, v_2, ..., v_d. The ordering property states that k_1 < k_2 < ... < k_{d-1} and every value k stored in the subtree rooted at v_i satisfies k_{i-1} < k < k_i (with the convention that k_0 = -oo and k_d = +oo). In the example above we have only shown the internal nodes, we could also have drawn leaves (external nodes) from each of the nodes (1,2,3), (7,8), (12), (18), (24), (33) and (56,80). In fact we need to consider these empty leaf nodes for the size property (when expressed in terms of children) to hold on these internal nodes. Another property which must hold in 2-3-4 trees is that all external nodes must be at the same depth. (This is the "depth property".) [[Q: Why do we want this property?]] [[Q: Given these two properties, what can we say about the height of a 2-3-4 tree which stores n items? Hint: First consider how to relate the number of external nodes in a 2-3-4 tree to the number of values stored in the internal nodes. Next let h be the height of a 2-3-4 tree and then determine the maximum and minimum number of external nodes as a function of h.]] [[Now, think about the three non-trivial operations (search, insert, delete) and how to perform them on a 2-3-4 tree.]]