Tutorial 2 - Hashing & Sorting Hashing: -------- Hashing - based on the concept that it is much easier to search by a numerical key of defined length than by arbitrary non-numerical key. Uses a function to generate the key from the search string of characters. Good hash function: - produced a unique result most of the time - produce the same result from the same input - should be calculated relaively easily Conflict Resolution: - Look for next available spot (linear probing) - Quadratic: try spot + n; spot + n^2 (to get away from congested area) - Chaining - make a list of all items that hash to this value Example function used: h(u) = (11*u) mod 7 + 1 1Club = 1, 1Diamond =2, 1Heart = 3, 1Spade = 4, 2CLUB = 5 Bubble Sort: ------------ 1. Compare adjacent elements. If the first is greater than the second, swap them. 2. Do this for each pair of adjacent elements, starting with the first two and ending with the last two. At this point the last element should be the greatest. 3. Repeat the steps for all elements except the last one. 4. Keep repeating for one fewer element each time, until you have no more pairs to compare. Interpolation Sort: ------------------- 1. Inserts the elements one at a time 2. Tries to guess the position (based on some statistics) 3. Compares elements at right and left of guessed position, then re-guesses on the correct side (unless the correct position is already found) Can be pretty good in terms of number of comparison, but lots of moving around of data limits usefulness in practice. Radix/Bucket Sort: ------------------ 1. take the least significant digit (or group bits) of each key. 2. sort the list of elements based on that digit, but keep the order of elements with the same digit (this is the definition of a stable sort). Do this by 3. repeat the sort with each more significant digit. Radix/Bucket Example: Here is a simple example of the sort. Suppose the input keys are 34, 12, 42, 32, 44, 41, 34, 11, 32, and 23. Four buckets are appropriate. The first pass distributes them by the least significant digit. After the first pass we have the following, where each line is a bucket. Bucket 1: 41 11 Bucket 2: 12 42 32 32 Bucket 3: 23 Bucket 4: 34 44 34 We collect these, keeping their relative order: 41 11 12 42 32 32 23 34 44 34. Now we distribute by the next most significant digit, in this case, the highest digit, and we get the following. Bucket 1: 11 12 Bucket 2: 23 Bucket 3: 32 32 34 34 Bucket 4: 41 42 44 When we collect them, they are in order: 11 12 23 32 32 34 34 41 42 44. Merge ------- 1. Divide the unsorted list into two sublists of about half the size 2. Sort each of the two sublists 3. Merge the two sorted sublists back into one sorted list. Iteratively (no recursion) 1. Think of list (n items) as n lists of 1 item each (thus all are sorted) 2. Merge successive pairs of lists (after 1st pass, n/2 lists on n elements) 3. After 3 passes, n/4 lists of 4 elements; etc. until sorted Sources used: * Wiki * NIST * Yahoo Groups