University of Toronto - Fall 2000
Department of Computer Science

## Week 12 - Complexity & Searching

```Complexity

The complexity of an algorithm is the amount of a resource, such as
time, that the algorithm requires.  It is a measure of how 'good' the
algorithm is at solving the problem.  The complexity of a problem is
defined as the best algorithm that solves a problem.

We've spent the whole course emphasizing correctness and
understandability, sometimes at the expense of speed.  When
we analyze complexity,
Better = faster.

But what does faster mean?

Measuring the speed of an algorithm

When you want to compare the speed of two algorithms, you can't just
implement them in programs and time the programs with a stop watch,
because the answer would be muddled by interfering factors:
* speed of computer
* choice of programming language
* choice of compiler and execution environment
* random events caused by other programs running during
execution of the test
* quality of programming

Instead, we isolate the "important operations" - that is, the ones that
take longest - and count them.

We are really interested in the behaviour of an algorithm, as the size
of our data set increases.  We want to know how long the algorithm
takes, in terms of the size of our data set, or some other factor in
our program.  These are issues that will be examined in more detail in
CSC148 (CSCA58 at Scarboro).

Phone Book Example

Suppose we had a telephone book, b, and we wanted to look up name, x,
and suppose the book has n names.

There are several ways to solve this problem:
Algorithm 1: Linear Search
Algorithm 2: Binary Search
A heuristic...

Linear Search  (Sequential)

Start with the first name, and continue looking until x is found.
Then print corresponding phone number.

Assume that we have an array, book, that stores references to objects
containing the name and phone number of each person in the phone book.
Assume there are n names in the phone book (n = book.length).

int whereFound = -1;	// position where name found

// Go through each entry, comparing name to x
for (int i=0; i<n; i++) {
// compare next name
if (book[i].getName().equals(x)) {
whereFound = i;
break;
}
}

if (whereFound != -1)
System.out.println ("Number is " + book[whereFound].getNumber();
else
System.out.println ("Name not in phone book.");

How much time does this take?  We look at the number of comparisons.
Best case:		1 step
Average case:	n/2 steps
Worse case:		n steps
Really these should be multiplied by 2 because each iteration
must compare the loop condition i<n.

Will this work if the list is unsorted?

Linear Search (on a Vector)

Here's a variation of the previous linear search algorithm.  Assume
that we are searching a sorted Vector of Strings to see if a particular
String is in that Vector.

import java.util.*;
public class LinSearch {
public static void main (String[] args) {
Vector list = new Vector();

int position = search ("apple", list);
System.out.println (position);
list.insertElementAt ("apple", position);

position = search ("elk", list);
System.out.println (position);
list.insertElementAt ("elk", position);

position = search ("giant", list);
System.out.println (position);
list.insertElementAt ("giant", position);
}

public static int search (String s, Vector list) {
int i = 0;

// Inv: list[0..i-1] < s.
// In English: everything we've searched for so
// far is less than s
while (i != list.size() && ((String)list.elementAt(i)).compareTo(s) < 0){
i++;
}

return i;
}
}

Note that the above version of the Linear Search fixes the bug that was
shown during the lecture.  During the lecture we had originally shown a
version of the Linear Search method that didn't have the i!=list.size()
condition.

Note that this method does not return -1 when s is not found.  It always
returns the index where s should be inserted into the list (regardless
of whether or not s already exists in the list).  We can determine if s
was found using list.elementAt(returned_index).equals(s).

How fast is this code?  In order to reason able this, we'll need to
know: What are the time units?  That is, what operations are we
counting?  - Comparisons.  How many of them are there?

If there are n items in the list, then the loop will iterate at most n
times.  With two comparisons per iteration, that makes roughly n
comparisons on average.
Best Case:
Average Case:
Worse Case:

Suppose we used this algorithm by hand to search for a name in the
Toronto phone book.  Here n=1000000.  If the time to check each element
by hand (a comparison) is 0.01 seconds (a very optimistic estimate for
a search by hand), then the average time is:
n x 0.01 sec = 1000000 x 0.01 sec = 10000 sec (more than 2.5 hours!)

We can cut the number of comparisons in half by cheating: we just add
s at the end of the Vector, and then we know we'll find it before we
fall off the end.  This is called Linear Search with Sentinel.

Sentinel Linear Search (on a Vector) - faster

import java.util.*;
public class LinSearchSentinel {
public static void main (String[] args) {
Vector list = new Vector();

int position = search ("apple", list);
System.out.println (position);
list.insertElementAt ("apple", position);

position = search ("elk", list);
System.out.println (position);
list.insertElementAt ("elk", position);

position = search ("giant", list);
System.out.println (position);
list.insertElementAt ("giant", position);
}

public static int search (String s, Vector list) {
int i = 0;

// Inv: list[0..i-1] < s.
// In English: everything we've searched for so
// far is less than s
while (((String)list.elementAt(i)).compareTo(s) < 0) {
i++;
}

// Put things back the way they were.
list.removeElementAt(list.size() - 1);

return i;
}
}

This certainly cuts the time for Linear Search in half, but this still
takes over an hour by hand.  We can't tolerate a method that takes that
long.  We need a better method.

How much time does Linear Search with Sentinel take?
Best Case:
Average Case:
Worse Case:

Binary Search

Compare x with the middle name in the book.  If x comes before y,
recursively look for x in the first half.  Otherwise recursively look
for x in the second half.  The search is finished when the beginning
and end of the list are within one.

Example

Suppose our book, b, contains these n (8) names, and we are looking for
the name, x, which is "helen."  Suppose that each name has a
corresponding phone number, although the numbers are not shown here.

begin                  mid                               end
0       1       2      3       4       5       6        7
[ anna     carl    doug   earl    fiona  gerard   helen     zack ]
begin            mid              end
begin   mid      end
begin     end

We find "helen" in 3 steps with the binary search, whereas it would have
taken 7 steps with the linear search.   [log2n = log28 = 3]

The number of times you go through the loop is proportional to the
number of times you have to divide the list of size n by 2 in order to
reduce it to a list that has begin and end within one ... log base 2 of n.

How much time does this take?  (Depends on how it's implemented.)
Best case:	log2n steps   (could be 1 step)
Average case:	log2n steps   (could be log2n / 2 steps)
Worse case:	log2n steps

Assume that we have an array book of phone book entries (name & number),
and we are searching for the phone number for the name x.  Here is a
version of the binary search that initializes the counters for begin and
end to the first and last indices in the array.

// Initialization
int mid;
int begin = 0;
int end = book.length-1;

// Loop until begin and end are within 1.
while (begin != end-1) {
mid = (begin+end)/2;
if (book[mid].getName().compareTo(x) > 0)
end = mid;
else
begin = mid;
}

// Print phone number found
if (book[begin].getName().equals(x))
System.out.println ("Found " + book[begin].getNum());
else

Note that this algorithm has a bug when you search for something that
is contained in the last index of the array, or if you search for
something that is bigger than the last value in the array.  The
algorithm returns the second last index in the array.  We end up with
begin=N-2 and end=N-1 because the algorithm ends as soon as begin and
end are within one.

Next we'll look at an algorithm that eliminates this problem.  It will
initialize begin to one smaller (-1) and end to one larger (N).  If the
element is in the last index of the array, the algorithm will finish
with begin=N-1 and end=N (begin contains the value we are searching
for).

Binary Search

Search an array of integers.

public class BinarySearch {
public static void main (String[] args) {
int[] list = {3,4,6,7,9,10,12,13,15};

System.out.println (search (12, list)); // 6
System.out.println (search ( 7, list)); // 3
System.out.println (search ( 2, list)); // -1
System.out.println (search (20, list)); // 8
}

// Return the index of the last element in list
// less than or equal to s.
public static int search (int s, int[] list) {
int begin = -1;         // start index
int end = list.length;  // end index

// Inv: list[0..begin]<=s && list[end..N-1]>s &&
// -1<=begin<end<=N  (Note: N is list.length)
while (begin != end-1) {
int m = (begin+end) / 2;

if (list[m] > s) {
end = m;
} else {
begin = m;
}
}

return begin;
}
}

Tracing Binary Search

Search key: 12

3   4   6   7   9  10  12  13  15
-1   0   1   2   3   4   5   6   7   8   9
b                                       e
m

Search key: 7

3   4   6   7   9  10  12  13  15
-1   0   1   2   3   4   5   6   7   8   9
b                                       e
m

Search key: 2

3   4   6   7   9  10  12  13  15
-1   0   1   2   3   4   5   6   7   8   9
b                                       e
m

Search key: 20

3   4   6   7   9  10  12  13  15
-1   0   1   2   3   4   5   6   7   8   9
b                                       e
m

Complexity

Which algorithm is more complex?  Linear Search or Binary Search?

How do these functions compare?

n		log2n
10		3
100		6
1,000		9
10,000		13
100,000		16

So for 100,00 items, binary search saves 999,984 comparisons compared to
linear search.  This is an amazing improvement!!  Sure beats that 50%
gain that linear search with sentinel gave us.

Big-Oh Notation - not on exam

We use "order of" (or Big-Oh notation) to express the time complexity
of an algorithm.  This in an approximation of the time that it takes to
run.

Suppose that you are comparing two different algorithms that solve a
particular problem.  One has a worse case complexity of n+1
comparisons, and the other has a worse case complexity of n+3.  What
does this mean in terms of Big-Oh?

We want to compare algorithms, as our data set gets really large (as n
increases).  Which of these are faster?

n + 1     <==>   n + 3       --> both are O(n)
n       <==>     2n        --> both are O(n)
n       <==>     n2        --> left is O(n), right is O(n2)
n2 + n   <==>    n2 + 300   --> both are O(n2)

We use Big-Oh notation to give a rough estimate of the complexity.
This is usually sufficient.  We remove the additive and multiplicative
factors from the expression, and say that the complexity of the
algorithm is "on the order of" some expression containing n.

If you count the number of operations in an algorithm, and get the
following number of comparisons, what would the complexity be?
5n comparisons                   --> O(n)
n + 10 comparisons               --> O(n)
0.6n + 33 comparisons            --> O(n)
5 comparisons                    --> O(1)
log2n + 1 comparisons            --> O(log2n)
2n3 + n + 5 comparisons          --> O(n3)
3log2n + 2n + 3 comparisons      --> O(n)

Comparing Runtime Functions

n	  	 	log2n	n2		2n
1			0	1		2
128			7	16384		1038
256			8	65536		Infinity
65536			16	109		Infinity
16 Meg			24	1014		Infinity
4096 Meg		32	1019		Infinity

Fastest to Slowest

O (1) --> O (log2n) --> O (n) --> O (n2) --> O (n3) -->
O (n4) --> O (2n) --> O (3n)

Analyze the complexity

Program 1: O (n2)

int sum = 0;
int num = 35;

for (int i=1; i<=2*n; i++) {
for (int j=1; j<=n; j++) {
num += j*3;
sum += num;
}
}

for (int k=1; k<=n; k++) {
sum++;
}

Program 2: O (n3)

int sum = 0;
int num = 35;

for (int i=1; i<=2*n; i++) {
for (int j=1; j<=n; j++) {
num += j*3;
sum += num;
}
}

for (int i=1; i<=n; i++)
for (int j=1; j<=n; j++)
for (int k=1; k<=n; k++)
num += j*3;

Program 3: O (n x m)

int sum = 0;

for (int i=1; i<=n; i++) {
for (int j=1; j<=m; j++) {
sum++;
}
}

Program 4: O (n2)

int sum = 0;

for (int k=1; k<=n; k++) {
sum++;
}

for (int i=1; i<=n; i++) {
for (int j=i; j<=n; j++) {
sum++;
}
}

Program 5: O (n)

int sum = 0;

for (int i=0; i<n; i++) {
if (i > n/2)
sum += 2;
else
sum++;
}

Program 6: O (n2)

int sum = 0;

for (int j=1; j<n; j++) {
sum++;
}

for (int k=0; k<n; k++) {
for (int i=0; i<n; i++) {
sum++;
}
for (int p=0; p<n; p++) {
sum++;
}
for (int m=0; m<n; m++) {
sum++;
}
}

Analyze the complexity

1. Find the maximum in an unsorted list of n numbers.
O (n)

2. Find both the maximum and minimum in an unsorted list.
O (n)

3. Sort n numbers.
O (n x log2n)

4.  Multiply two n x n matrices
O (n3)

5.  Linear Search (sorted/unsorted)
O (n) - sorted
O (n) - unsorted

6.  Binary Search (sorted/unsorted)
O (log2n) - sorted
Algo does not work - unsorted
```