University of Toronto - Fall 2000
Department of Computer Science

Week 12 - Complexity & Searching

Complexity

The complexity of an algorithm is the amount of a resource, such as time, that the algorithm requires. It is a measure of how 'good' the algorithm is at solving the problem. The complexity of a problem is defined as the best algorithm that solves a problem. We've spent the whole course emphasizing correctness and understandability, sometimes at the expense of speed. When we analyze complexity, Better = faster. But what does faster mean?

Measuring the speed of an algorithm

When you want to compare the speed of two algorithms, you can't just implement them in programs and time the programs with a stop watch, because the answer would be muddled by interfering factors: * speed of computer * choice of programming language * choice of compiler and execution environment * random events caused by other programs running during execution of the test * quality of programming Instead, we isolate the "important operations" - that is, the ones that take longest - and count them. We are really interested in the behaviour of an algorithm, as the size of our data set increases. We want to know how long the algorithm takes, in terms of the size of our data set, or some other factor in our program. These are issues that will be examined in more detail in CSC148 (CSCA58 at Scarboro).

Phone Book Example

Suppose we had a telephone book, b, and we wanted to look up name, x, and suppose the book has n names. There are several ways to solve this problem: Algorithm 1: Linear Search Algorithm 2: Binary Search A heuristic...

Linear Search (Sequential)

Start with the first name, and continue looking until x is found. Then print corresponding phone number. Assume that we have an array, book, that stores references to objects containing the name and phone number of each person in the phone book. Assume there are n names in the phone book (n = book.length). int whereFound = -1; // position where name found // Go through each entry, comparing name to x for (int i=0; i<n; i++) { // compare next name if (book[i].getName().equals(x)) { whereFound = i; break; } } if (whereFound != -1) System.out.println ("Number is " + book[whereFound].getNumber(); else System.out.println ("Name not in phone book."); How much time does this take? We look at the number of comparisons. Best case: 1 step Average case: n/2 steps Worse case: n steps Really these should be multiplied by 2 because each iteration must compare the loop condition i<n. Will this work if the list is unsorted?

Linear Search (on a Vector)

Here's a variation of the previous linear search algorithm. Assume that we are searching a sorted Vector of Strings to see if a particular String is in that Vector. import java.util.*; public class LinSearch { public static void main (String[] args) { Vector list = new Vector(); list.addElement ("bear"); list.addElement ("dog"); list.addElement ("fox"); int position = search ("apple", list); System.out.println (position); list.insertElementAt ("apple", position); position = search ("elk", list); System.out.println (position); list.insertElementAt ("elk", position); position = search ("giant", list); System.out.println (position); list.insertElementAt ("giant", position); } public static int search (String s, Vector list) { int i = 0; // Inv: list[0..i-1] < s. // In English: everything we've searched for so // far is less than s while (i != list.size() && ((String)list.elementAt(i)).compareTo(s) < 0){ i++; } return i; } } Note that the above version of the Linear Search fixes the bug that was shown during the lecture. During the lecture we had originally shown a version of the Linear Search method that didn't have the i!=list.size() condition. Note that this method does not return -1 when s is not found. It always returns the index where s should be inserted into the list (regardless of whether or not s already exists in the list). We can determine if s was found using list.elementAt(returned_index).equals(s). How fast is this code? In order to reason able this, we'll need to know: What are the time units? That is, what operations are we counting? - Comparisons. How many of them are there? If there are n items in the list, then the loop will iterate at most n times. With two comparisons per iteration, that makes roughly n comparisons on average. Best Case: Average Case: Worse Case: Suppose we used this algorithm by hand to search for a name in the Toronto phone book. Here n=1000000. If the time to check each element by hand (a comparison) is 0.01 seconds (a very optimistic estimate for a search by hand), then the average time is: n x 0.01 sec = 1000000 x 0.01 sec = 10000 sec (more than 2.5 hours!) We can cut the number of comparisons in half by cheating: we just add s at the end of the Vector, and then we know we'll find it before we fall off the end. This is called Linear Search with Sentinel.

Sentinel Linear Search (on a Vector) - faster

import java.util.*; public class LinSearchSentinel { public static void main (String[] args) { Vector list = new Vector(); list.addElement ("bear"); list.addElement ("dog"); list.addElement ("fox"); int position = search ("apple", list); System.out.println (position); list.insertElementAt ("apple", position); position = search ("elk", list); System.out.println (position); list.insertElementAt ("elk", position); position = search ("giant", list); System.out.println (position); list.insertElementAt ("giant", position); } public static int search (String s, Vector list) { int i = 0; list.addElement(s); // Inv: list[0..i-1] < s. // In English: everything we've searched for so // far is less than s while (((String)list.elementAt(i)).compareTo(s) < 0) { i++; } // Put things back the way they were. list.removeElementAt(list.size() - 1); return i; } } This certainly cuts the time for Linear Search in half, but this still takes over an hour by hand. We can't tolerate a method that takes that long. We need a better method. How much time does Linear Search with Sentinel take? Best Case: Average Case: Worse Case:

Binary Search

Compare x with the middle name in the book. If x comes before y, recursively look for x in the first half. Otherwise recursively look for x in the second half. The search is finished when the beginning and end of the list are within one.

Example

Suppose our book, b, contains these n (8) names, and we are looking for the name, x, which is "helen." Suppose that each name has a corresponding phone number, although the numbers are not shown here. begin mid end 0 1 2 3 4 5 6 7 [ anna carl doug earl fiona gerard helen zack ] begin mid end begin mid end begin end We find "helen" in 3 steps with the binary search, whereas it would have taken 7 steps with the linear search. [log2n = log28 = 3] The number of times you go through the loop is proportional to the number of times you have to divide the list of size n by 2 in order to reduce it to a list that has begin and end within one ... log base 2 of n. How much time does this take? (Depends on how it's implemented.) Best case: log2n steps (could be 1 step) Average case: log2n steps (could be log2n / 2 steps) Worse case: log2n steps Assume that we have an array book of phone book entries (name & number), and we are searching for the phone number for the name x. Here is a version of the binary search that initializes the counters for begin and end to the first and last indices in the array. // Initialization int mid; int begin = 0; int end = book.length-1; // Loop until begin and end are within 1. while (begin != end-1) { mid = (begin+end)/2; if (book[mid].getName().compareTo(x) > 0) end = mid; else begin = mid; } // Print phone number found if (book[begin].getName().equals(x)) System.out.println ("Found " + book[begin].getNum()); else System.out.println ("Value not found."); Note that this algorithm has a bug when you search for something that is contained in the last index of the array, or if you search for something that is bigger than the last value in the array. The algorithm returns the second last index in the array. We end up with begin=N-2 and end=N-1 because the algorithm ends as soon as begin and end are within one. Next we'll look at an algorithm that eliminates this problem. It will initialize begin to one smaller (-1) and end to one larger (N). If the element is in the last index of the array, the algorithm will finish with begin=N-1 and end=N (begin contains the value we are searching for).

Binary Search

Search an array of integers. public class BinarySearch { public static void main (String[] args) { int[] list = {3,4,6,7,9,10,12,13,15}; System.out.println (search (12, list)); // 6 System.out.println (search ( 7, list)); // 3 System.out.println (search ( 2, list)); // -1 System.out.println (search (20, list)); // 8 } // Return the index of the last element in list // less than or equal to s. public static int search (int s, int[] list) { int begin = -1; // start index int end = list.length; // end index // Inv: list[0..begin]<=s && list[end..N-1]>s && // -1<=begin<end<=N (Note: N is list.length) while (begin != end-1) { int m = (begin+end) / 2; if (list[m] > s) { end = m; } else { begin = m; } } return begin; } }

Tracing Binary Search

Search key: 12 3 4 6 7 9 10 12 13 15 -1 0 1 2 3 4 5 6 7 8 9 b e m Search key: 7 3 4 6 7 9 10 12 13 15 -1 0 1 2 3 4 5 6 7 8 9 b e m Search key: 2 3 4 6 7 9 10 12 13 15 -1 0 1 2 3 4 5 6 7 8 9 b e m Search key: 20 3 4 6 7 9 10 12 13 15 -1 0 1 2 3 4 5 6 7 8 9 b e m

Complexity

Which algorithm is more complex? Linear Search or Binary Search? How do these functions compare? n log2n 10 3 100 6 1,000 9 10,000 13 100,000 16 So for 100,00 items, binary search saves 999,984 comparisons compared to linear search. This is an amazing improvement!! Sure beats that 50% gain that linear search with sentinel gave us.

Big-Oh Notation - not on exam

We use "order of" (or Big-Oh notation) to express the time complexity of an algorithm. This in an approximation of the time that it takes to run. Suppose that you are comparing two different algorithms that solve a particular problem. One has a worse case complexity of n+1 comparisons, and the other has a worse case complexity of n+3. What does this mean in terms of Big-Oh? We want to compare algorithms, as our data set gets really large (as n increases). Which of these are faster? n + 1 <==> n + 3 --> both are O(n) n <==> 2n --> both are O(n) n <==> n2 --> left is O(n), right is O(n2) n2 + n <==> n2 + 300 --> both are O(n2) We use Big-Oh notation to give a rough estimate of the complexity. This is usually sufficient. We remove the additive and multiplicative factors from the expression, and say that the complexity of the algorithm is "on the order of" some expression containing n. If you count the number of operations in an algorithm, and get the following number of comparisons, what would the complexity be? 5n comparisons --> O(n) n + 10 comparisons --> O(n) 0.6n + 33 comparisons --> O(n) 5 comparisons --> O(1) log2n + 1 comparisons --> O(log2n) 2n3 + n + 5 comparisons --> O(n3) 3log2n + 2n + 3 comparisons --> O(n)

Comparing Runtime Functions

n log2n n2 2n 1 0 1 2 128 7 16384 1038 256 8 65536 Infinity 65536 16 109 Infinity 16 Meg 24 1014 Infinity 4096 Meg 32 1019 Infinity Fastest to Slowest O (1) --> O (log2n) --> O (n) --> O (n2) --> O (n3) --> O (n4) --> O (2n) --> O (3n)

Analyze the complexity

Program 1: O (n2) int sum = 0; int num = 35; for (int i=1; i<=2*n; i++) { for (int j=1; j<=n; j++) { num += j*3; sum += num; } } for (int k=1; k<=n; k++) { sum++; } Program 2: O (n3) int sum = 0; int num = 35; for (int i=1; i<=2*n; i++) { for (int j=1; j<=n; j++) { num += j*3; sum += num; } } for (int i=1; i<=n; i++) for (int j=1; j<=n; j++) for (int k=1; k<=n; k++) num += j*3; Program 3: O (n x m) int sum = 0; for (int i=1; i<=n; i++) { for (int j=1; j<=m; j++) { sum++; } } Program 4: O (n2) int sum = 0; for (int k=1; k<=n; k++) { sum++; } for (int i=1; i<=n; i++) { for (int j=i; j<=n; j++) { sum++; } } Program 5: O (n) int sum = 0; for (int i=0; i<n; i++) { if (i > n/2) sum += 2; else sum++; } Program 6: O (n2) int sum = 0; for (int j=1; j<n; j++) { sum++; } for (int k=0; k<n; k++) { for (int i=0; i<n; i++) { sum++; } for (int p=0; p<n; p++) { sum++; } for (int m=0; m<n; m++) { sum++; } }

Analyze the complexity

1. Find the maximum in an unsorted list of n numbers. O (n) 2. Find both the maximum and minimum in an unsorted list. O (n) 3. Sort n numbers. O (n x log2n) 4. Multiply two n x n matrices O (n3) 5. Linear Search (sorted/unsorted) O (n) - sorted O (n) - unsorted 6. Binary Search (sorted/unsorted) O (log2n) - sorted Algo does not work - unsorted