=========================================================================== CSC 263H Lecture Summary for Week 1 Winter 2004 =========================================================================== ------------------- Course organization ------------------- See course information sheet. --------------- Data Structures --------------- An "Abstract Data Type" (ADT) is a set of objects together with a set of operations that can be performed on these objects. For example: 1. Objects: integers Operations: ADD(x,y), MULTIPLY(x,y), etc. 2. Stacks Objects: lists (or sequences) Operations: PUSH(S,v), POP(S), ISEMPTY(S) ADT's are important for specification, and they provide modularity and reuse in a program, since usage of an ADT is completely independent of its implementation. A "Data Structure" is an implementation of an ADT. It consists of a way to represent the objects, and algorithms for all the operations. For example, a stack can be implemented in many ways: (a) Using a linked list (we keep only a pointer to the head of the list), we can check that the stack is empty by testing if the pointer to the head of the list is NULL, we can pop by setting the head of the list to the next value and returning the element stored there, and we can push an element by inserting at the front of the list. (b) Using an array together with a counter (for the size of the stack, or to keep track of the index of the top of the stack), with the obvious implementation of the operations. In general, an ADT is a way to describe _what_ the data is, and _what_ you can do with it, while a data structure is a way to describe _how_ the data is stored and _how_ the operations are performed. In this course, we will show you many ADT's, and many data structures for these ADT's. Some of them you will have seen before, others will be new. ------------------ Algorithm Analysis ------------------ What is the "complexity" of an algorithm? Conceptually, it is the amount of resources, measured as a function of the size of the input. Why analyze the complexity of algorithms? In order to be able to compare them, so that we can choose between different implementations of an ADT. We could also use this information to determine, for example, the size of the largest problems that can be solved on a particular machine within a certain time, etc. By "resource", we will usually mean time (the running time of the algorithm) or space (the amount of memory used by the algorithm), although other measures are possible, and are used in practice (e.g., the number of logic gates in a circuit, the number of bits of communication in a network, etc.). The "size" of the inputs will usually be measured at a high-level, in a problem-dependent fashion. For example: For algorithms manipulating numbers, "size" = number of bits. For algorithms manipulating lists, "size" = number of elements. For algorithms manipulating graphs, "size" = number of vertices. The "running time" will usually be measured at a high-level also, by simply counting the number of "steps" taken by the algorithm, where the definition of "step" will depend on the algorithm. ----------------------- Worst-Case running time ----------------------- For an algorithm A, let t(x) represent the number of steps taken by A on input x. Then, we define the _worst-case_ running time of A on inputs of size n to be the maximum time taken by A on all inputs of size n. Formally, T(n) = max{ t(x) : x is an input of size n } For example, consider the following simple program that searches a singly-linked list for a specified key. (We use the notation "=/=" to represent "not equal to".) ListSearch(L, k): z := head(L) while z =/= NIL and key(z) =/= k: z := next(z) return z For this example, the "size" of the input will be simply the number of elements in the list, i.e., length(L). Also, we will count only comparisons when measuring the running time, since the total number of operations performed by the algorithm on a given input is within a constant factor of the number of comparisons performed, and comparisons are conceptually the central operation performed by the algorithm. In this case, we can compute T(n) = 2n+1 since the algorithm makes at most two comparisons for every element, plus one more when z == NIL (this happens when k is not in the list). ------------------------- Average-Case running time ------------------------- In general, let A be an algorithm and consider S_n, the sample space of all inputs of size n. In order to talk about the "average" running time of A over S_n, we need to specify how likely each input is. This is done by specifying a probability distribution over S_n. Once this is done, we can let t_n(x) be the number of steps taken by A on input x, for x in S_n. Hence, t_n is a random variable (it assigns a numerical value to each element of our probability space). From probability theory, we know that the "average" value of t_n(x) is E[t_n], the expected number of steps taken by A on inputs of size n, and is equal to E[t_n] = sum t_n(x) * Pr(x) x in S_n where Pr(x) is the probability of input x given by the probability distribution. Accordingly, we define the average case running time of A to be T'(n) = E[t_n]. Now, let's look again at the ListSearch algorithm and compute its average case running time. First, we need to specify the sample space S_n. Again, we are faced with an apparent problem: how can we do this when there are an infinite number of inputs? Since the behaviour of ListSearch is completely determined by only one factor (the position of k inside L), this infinite set of inputs can in fact be reduced to one of only n+1 possibilities: k occurs in position 1 of L, k occurs in position 2 of L, ... k occurs in position n of L, k does not occur in L. Hence, we can let S_n = { (L,k) : L = [1,2,...,n] and k is one of 0,1,2,...,n }, since this set of inputs represents every possible behaviour of the algorithm. (For example, the behaviour of the algorithm on input L = [76,2,15], k = 15 is exactly the same as on input L = [1,2,3], k = 3.) Of course, there are other ways to define S_n to achieve the same effect. Next, we must specify a probability distribution: because we have no other information and to simplify the analysis, we will use a uniform distribution (i.e., every input is equally likely). In this case, since |S_n| = n+1, this means that each input is a random event with probability 1/(n+1). (Note that this is not always a reasonable assumption, for example, a particular application might search for values that often do not occur in the list, so the probability would need to be higher for the input (L,0) than for the other inputs.) For our example, t_n(L,k) is the number of comparisons performed on input (L,k), and it is given by the following expression: { 2k if 1 <= k <= n (then k occurs in position k) t_n(L,k) = { { 2n+1 if k = 0 (since 2 comparisons are performed for every iteration of the loop, and the loop goes through k iterations for value k; the extra comparison when k = 0 is because of the last test for z =/= NIL when we reach the end of the list, assuming that the "and" operator in the loop condition short-circuits, i.e., does not evaluate the second argument when the first one is false). Hence, the average running time is equal to: T'(n) = E[t_n] = sum ( t_n(L,k) * Pr[L,k] ) (L,k) in S_n n = t_n(L,0) * 1/(n+1) + sum ( t_n(L,k) * 1/(n+1) ) k=1 n = (2n+1)/(n+1) + 1/(n+1) * sum 2k k=1 = (2n+1)/(n+1) + 2/(n+1) * n(n+1)/2 = (2n+1)/(n+1) + n. Notice that this is consistent with our intuition: if we search for elements in a list when the elements are equally likely to be anywhere in the list, we expect that on average we have to look through approximately half the list. And in fact, T'(n) is approximately equal to n, which is half of T(n), the worst-case. The fact that T'(n) is slightly larger than n is caused by the extra case when k is not in the list, which shifts the average slightly up. ---------------------- Best-case running time ---------------------- There is another possible measure of the running time: instead of taking a maximum or an average of the running times for all inputs of a certain size, we could take a minimum. The best-case running time of an algorithm A on inputs of size n is defined as: min{ t(x) : x is an input of size n } What's the best-case running time of ListSearch on inputs (L, k) such that length(L) = n? From the definition of t(L, k), the best scenario is that k occurs first in the list, in which case the running time is 2 (independent of n). It's fairly clear that the best-case time usually does not provide us with very useful information... There are exceptions, for example if the best-case running time is equal to the worst-case running time, but for the rest of this course, we will mainly ignore this measure and concentrate on worst-case and average-case instead. ----------------------------- Lower Bounds vs. Upper Bounds ----------------------------- Recall that there is an important distinction between proving upper bounds on an algorithm's worst-case running time and proving lower bounds. An upper bound is usually expressed using big-Oh notation. To prove an upper bound of g(n) on the worst-case running time T(n) of an algorithm means to prove that T(n) is O(g(n)). This is roughly equivalent to proving that T(n) = max{ t(x) : x has size n } <= g(n). How can we prove that the maximum of a set of values is no more than g(n)? The easiest way is to prove that _every_ member of the set is no more than g(n). In other words, to prove an upper bound on the worst-case running time of an algorithm, we must argue that the algorithm takes no more than that much time on _every_ input of the right size. (In particular, you _cannot_ prove an upper bound if you only argue about one input, unless you also prove that this is input really is the worse -- in which case you're back to proving something for every input!) A lower bound is usually expressed using Omega notation. To prove a lower bound of f(n) on the worst-case running time T(n) of an algorithm means to prove that T(n) is Omega(f(n)). This is roughly equivalent to proving that T(n) = max{ t(x) : x has size n } >= f(n). How can we prove that the maximum of a set of values is at least f(n)? The easier way is to find one element of the set which is at least f(n). In other words, to prove a lower bound on the worst-case running time of an algorithm, we only have to exhibit one input for which the algorithm takes at least that much time.