The story of the course so far

Relevant reading of the textbook (Goodrich and Tamassia): Chapter 1.1, 1.2, 1.3, 1.4 (pages 5--33)

We discussed two important and related concepts. Abstract Data Types (ADT)) and data-structures. ADTs are objects together with a set of operations that can be performed on them. A data-structure is the representation and implementation of an ADT. It is important to understand that the ADT itself is independent of its implementation.using a linked list

How do we analyze complexity? We wish to relate the resource needed to run an algorithm to the SIZE of the input on which hit is run.

resources: typically means time. More specifically, we will count teps or certain operations (comprisons, assignments) as the measure of time, and the exact choice will depend on the context.

size are typically measuredby the significant units of the object. For example, in a list the size will be the length of the list.

We can now define t_A(x) is the number of steps an algorithm A takes when its input is x. We can more simply call this t(x) remembering that this relates to a certain algorithm A. a worst case analysis considers the input of a certain length with the biggest possible running time. Formally, we define T(n) = max {t(x) : x is of size n}

Another option is to talk about T'(n) = Avg{t(x)} where the average is done on some probability distribution over inputs. This is called average case analysis.

we described a simple method to search a key in a list and saw that T(n) = 2n+1, but when we consider some reasonable distribution on possible inputs we get that T'(n) ~ n+1, namely, roughly half the worst case running time.

We also considered the standard method to calculate the maximum element of an array, and saw that T(n)=n. (Here we counted the number of assignments of the "current-maximum".)

To talk about average case complexity of calculating maximum, we had to define a distribution. The most natural one here is the uniform distribution over all arrays of size n that contain all elements 1,2,...,n. We then charcterized the number of assignments: it is the number of elements that are bigger than all preceeding ones.

We used an indicator variable X_i = 1 if element i is bigger than element 1,2,3,...,i-1, and otherwise X_i = 0. Now X = sum X_i is the number of assignments. Since we need to find the expectation of X, we look at E[X] = E [sum X_i] = sum E[X_i], where we use here linearity of expectation.

We now need to notice that E[X_i] = Pr[X_i =1] = 1/i (do you see why the previous two equalities are correct?) and so we get that E[ X] = E [sum X_i] = sum E[X_i] = sum_i 1/i = Theta(log n).

Relevant reading: Chapter 4.3 (quick sort)

We discussed sorting: taking as input a sequence of numbers and returning a sorted (increasing) version of that list. Instead of numbers we could have any set of items that come from a totally ordered universe (so strings is another example).

What is a naive way to sort? Take minimum element, place it in first location, take minimum of the rest, place in second location, and so on. This will take Theta(n^2).

We considered another algorithm called QuickSort: We pick the first element of the sequence as "pivot" element, and split the rest of the elements to two sets, L which are elements that are Less than the p, and G which are Greater thn p. We sort recursively each one of these sets, and produce the sorted version of L then p and then the sorted version of G.

We were disappointed to see tht this algorithm performs, in the worst case, like the naive algorith, that is has Theta(n^2) comparisons (we count # of comparisons for the runtime of the algorithm). Interestingly, the worst case input are sorted sequences. In A1 you are asked to show that even if the sequence is "almost sorted" the running time is still Omega(n^2).

In spite of that quicksort is is the most used sorting algorithm, why? We tried to get a sense for the average case complexity. Again, we considered the uniform distribution over all arrays of size n that contain all elements 1,2,...,n. It can be shown that the average number of comparisons now is O(n log n)! The intuition behind the analysis is that in a atypical case the pivot will split the elements to two sets L and G which are fairly balanced. MOre precisely, it is very likely that the size of L and of G are at least n/10 and at most 9n/10. We showed that if this were always the case we could get the above bound. Doing this analysis properly is a little bit more involved and we didn't continue this line.

We moved to discuss randomized quicksort, in which the pivot is chosen randomly. In a randomized algorithm we typically measure the expected running time. Remember that now the running time (or # of comparisons, or any other measure) is a random variable. It is not very hard to see (although we didn't actually show it) that this average # of comparisons of randomized quicksort is the same as the expected # of comparisons of the determinsitic quicksort.

The actual analysis we gave was as follows. Let X_ij = # f comparisons between i and j (we may assume that the elements in the sequences are 1,2,...n). We alrady now that every two elelemtns can be compared at most once. So X_ij is 0 or 1. Notice that sum X_i is the number of comparisons we want to measure. Now E[X_ij] = Pr [X_ij = 1] = Pr [i and j are compared].

What is Pr [i and j are compared]? Consider the first point where a pivot from {i, i+1, i+2,...,j} is chosen. (Do you see why there has to be such a point?) If this pivot is i or j then i and j will be compared and otherwise they will not. So Pr [i and j are compared] = 2/(j-i+1). Now we get that E[sum_i sum_{j>i} X_ij] = sum_i sum{j>i} E[X_ij] = sum_i sum{j>i} 2/(j-i+1} <= sum_i log n <= n log n.

Relevant reading: Chapter 3.1. Dictionary ADTs and BSTs Dictionaries are Common Data Types when dealing with data with order. Example: employees with their Social Insurance Numbers. Formally, the objective is to be able to look-up data quickly by KEYS associated with it, and to allow addition and removal of items. Object x contains a key and possibly some data items describing it. key[x] comes from a totally ordered universe. The operations we want to support are

search(D,k) which should return a pointer to element x such that key[x]=k (or NIL if none exists).
insert (D,x) Add element x to D.
delete (D,x) delete element x from D.

For dictionary datastructures,arrays or linked lists are one possibility. Consider haveing an unsorted array. Then clearly, the worst case time for searching a key is Theta(n). If, instead, the array is ordered, then search time goes down to O(log n) (binary search), but insert and delete will require linear time. operationArrays are one option some options for data structure

our goal is to achieve a data-structure supporting ALL operations in time O(log n).

We considered trees with nodes that are the elements (each having a key). A "search tree" is one that has the "search property": for every node x, the keys in the left subtree of x are smaller than the key[x], and the keys inth right subtree of x are bigger than key[x]. The depth of a node x in a tree is the length of the (unique) path from root to x. the height of a tree is the maximal depth of a node.

We saw a simple method for the search operation, and that takes time O(H) here H is the height of the tree. It can be shown (see tutorial) that insert and delete, and in fact all "reasonable" operations can be perfomrmed in worst-case time O(H). The question then becomes "how nig can H be?".

In general a tree with n nodes can have height as large as n-1. On the other hand, a tree that is totally balanced will have depth O(log n). The hope is to be able to restrict the tree one gets in the data-structure to balanced trees.

Relevant reading: Chapter 3.2. AVL trees See also a useful writeup about AVL trees by Vassos Hadzilacos.

an AVL tree is a search tree (with the "search property") that is height balanced: this means that for all node x the height of the left subtree of x and the height of the right subtree of x differ by at most 1. It can be shown that such a tree has height O(log n). Therefore if we can maintain this property for our search tree through insertions and deletions (that are allowed to work in time O(H), we would get that all these operations can be done in time O(log n).

typically, an AVL DS will maintain information about the height of their left and right subtrees (and in particualr this easily gives the "balance factor" of a node: how much is the left ST higher than the right ST (can 1,0,-1 for a height balanced tree).

Search is performed precisely like the general BST setting. Except now a worst case running time of O(log n) is guaraneed.

Insertions dn deletions are much more tricky. We first "ignore" the balance property, and apply an insertion that may violate balance. We then perform "rotations", that is changing the structure of the tree (while keeping the same set of elements) so as to

Respect the search property.
make the tree balanced.

The rotations involve some meticulous case analysis and trakking of the balance factors of different nodes.

Relevant reading: Chapter 2.4. Priority queues and Heaps

The priority Queue ADT is a ssociated with a set of keys where smaller keys signifies "higher priority". The main objective is to answer "highers priority (or minimum key)" queries fast.

Formally we want to support the following operations on Queue Q

Lookup-Min(A) which should return an element with minimal key.
ExtractMin(Q) which returns an element with minimal key AND removes it.
Insert(Q,k) Insert an element with key k.
Size(A) returns the number of elements in Q.

The heap Data structure is a complete binary tree that is represented as an array. "complete" means that the tree has all its levels full except perhaps the last level which is "full to the left".

The array represents the tree (so no pointers are involved) in a simple way: the root is at location 1. Its left and right children are in locations 2, 3. The 4 grandchildren are in location 4,5,6,7, etc.

This allows for a simple way to find the idex in the array of a parent and children of a particualr node: parent(i) = floor(i/2). leftchild(i) = 2*i, rightchild(i)=2*i+1.

In addition to this complete-tree representation a heap mnaintains the "heap property" which is that for all nodes x (other than the root) the key in node x is at least as big as the key of its parent.

How do we insert? The idea is that we add the element at the first empty location of the array, thus making it the "rightmost" leaf. This creates a proper complete tree with a proper array representation alas it may violate the heap-property. The violation can only occur with respect to that new key.

We now perform "up-heap bulbble" which meas that we exchage the node of the new key with its parent if there is a violation, and continue this way until there is no more violation. We proved that this will make the whole tree satisfy the heap property at the end of this procedure.

A similar treatment would support extract-min. We start with exchanging the last array element with the root (which we delete). Now we have a complete tree with the right set of elelemtns, but the root may violate the heap-properrty. We perform a "down-heap" bubble which means that we exchange the new key with its child of smallest key, as long as this smallest key is smaller than the new key. Again it is easy to show that at termination of this process we have a valid heap.

How long do these operations take? LookupMin takes O(1) time. As for insert and extract-min.The first phase before the bubbling takes only O(1) time. The "bubbling phase" performs an iteration that can only be as long as the height. Now, since this is a complete tree it is obvious that the height is O(log n) where n is the number of elements.

heap-sort : How do we use this data-structure in order to perform soriting? Recall the naive algorithms that extract the minimum elelemtn, places it first, and continue with the next min, placing it second etc. This algorithm takes Theta(n^2) time since the "min" operation takes linear time on an arbitrary sequence. However, if all items of a sequence to be sorted were organized in a heap we could perform the min extraction in logarithmic time.

So here is a simple sorting algorithm, called heap-sort: insert the n elements one by one into an initially empty heap using the method "insert". Then repeat extract min n times and the produced items will be sorted. Since there are n insert operations and n extract-min operations, and there are never more than n items in the heap this clearly gives O(n log n) time sorting algorithm. Binomial heaps Not in text book. HEre is the chapter about Binomial Heaps from CLRS What if we want to take the union of two heaps? In tutorial you saw that binary trees which are heaps can easily do that, but only when comprimising on the property that the tree is complete. So the heap structure we described above will not do. We now consider a more flexible structure that maanges to answer all queries including union in time O(log n).

A binomail tre is defined recursively. B_0 is the tree with one node, and B_k takes two copies of B_{k-1} and connect the root of one of the copies to the root of the second. We saw some important properties for such objects.

The height of B_k is k.
There are 2^k nodes in B_k.
The root of B_k has k children, and all other nodes has less children.

Now, the heap itself, called Binomial heap contains a collection of such binomial trees, where we insist on having at most one copy of each B_k. Notice that this means a unqiue way of representing a certain number of elements as the sum of sizes of binomail trres. For example, 19 elelemtns can be composed of B_4 (of size 16), B_1 (of size 2) and B_0 ( of size 1) and this is the only way it can be represented. Notice that this is similar to the way we write 19 in binary: 19 = 10011.

More specifically, we link all the roots of the corresponding binomial trees in order to represent the heap. Each binoomial tree contains keys that satisfies the heap proporty. This list is no longer than floor(log n)+1 which can be verified by looking at the above connection with binary representation of numbers.

Notice that Lookup-min does no longer work in constant time: it will go over all roots of binomial trees in the list, so the running time will be O(log n). [There is a way to make it O(1) by maintaining a pointer to the minimum at all times].

How do we find the union of two binomail-heaps? We start with Binomial-heap-Merge, which simply merges the two lists of binomial trees in increasing order. Notice that now we can no longer guaranee that the list contains at most one copy from each B_k. To remedy this we should perform an operation that replaces two copies of B_k into a single copy of B_{k+1}. Such an operation takes O(1) for every k: simply connect the root with higher key to the root with lowesr key. This in total would take O(log n) for the whole list.

To insert key k into Heap H it is possible to use the union method: just create a single bionial tree with one node that has the key to be inserted as one heap H', and now take Union(H,H').

For extract-min(H) you first need to find the min element x (use lookipMin). Now let H1 be H without the tree rooted at x (let's say it's B_k). H2 will be the result of decomposing the tree of x after the removal of x. Specifically, we just take all children of x, which are B_0, B_1, B_2, B_{k-1}. There is n operation that we need to do other then to link them together to make the heap H_2. (This is simpler than what I described in class). See example on page 469 in CLRS.

Hashing. Relevant reading. 2.5.2, 2.5.3, 2.5.4, 2.5.5

imagine going over a file which is a sequence of "huge" numbers, each a 32-bit number, where you would like to store the numbers and have a very fast access to them, say as fast as O(1). A simple solution of a direct-access table, in which a number is stored in an array entry with index that equals the key would be reasonable in other case where the "universe" is of small numbers. In this situation such a solution is downright impossible, as the array size cannot be as large as 2^32.

One of the most efficient ways to implement a dictionary in these circumstances is to use a hash table . We will se that it has worst case search time of O(n) but neverthelss is very useful as on average it will have search time of O(1) (much better than searchtrees).

Here is the set up:

the possible universe of keys is the set of all possible keys we may ever encounter. In the above example it is all 32-bit numbers. We call this universe U and denote N = |U|.
An array (table) T with some "reasonable" number of positions m (think of m as in the thousands) is allocated and the keys we encounter are mapped into one of those positions. This array is called a hash-table.
We consider a hash function which is a function h, h : U -> {0,1,...,m-1} that maps keys to positions in the hash table so that for each possible key k in U, k will be stored in T at position h(k).

To store an element we simply store it in T[h(k)] where k is the key. To look for an element we find it in T[h(k)]. So this sounds like an O(1) solution. What is the catch? collisions...

If two different keys k1 and k2 hash to the same location (h(k1) = h(k2)), we say that they are in collision. This is a problem since we would like each position to have at most on elelemnts to be able to easily argue the O(1) running time of storing and finding elelemtns. Note, since N is much bigger than m we cannot hope to avoid the problem of collision.

We need to (i) pick hash functions h that will make collisions less likely. and (ii) design a method to deal with collisions.

Let's start with point (i). First we examine the case where the universe does not contain numbers. It is useful to first transform the objects into integer numbers. For alphanumeric strings we may sum the ASCII values of all characters invovled. We point to the inherent problem with this solution: many words are simply differnt permutations of the character of others (sport, pots, stop, opts) and so the likelihood of collisions increases for no good reason. one alternative is to take a Polynomial hash code (see page 118 in the text).

the mod opertion is a natural ingredient in hash function. Notice, for an integer x, x mod m will always be a number in the range 0,1,...,m-1. Notice that if m is, say, 1024 = 2^10, then x mod m will simply take the last 10 bits of x. Since pattern may occur making certain part of a strings fairly fixed (think of the first 2 digits of your student numbers...) this is a poor choice that leads to frequent collisions. A typical solution is to take m to be a prime.

How to deal with (ii)? There are two main approaches. closed addressing and open addressing

closed addressing (or chaining means that x ends up in position h(x) and no exceptions to that are allowed. But since collision may occur, this implies that one position must store more than one object in general. So, instead of storing each key k directly at location h(k), we store a simple unordered singly-linked list of keys at each location in table T (so that we can accomodate collisions by storing each key that hashes to the same location in one linked list).

open addressing is a collision-handling strategies that uses a predetermined rule to calculate a sequence of buckets for key k: A_0(k),A_1(k),A_2(k),... into which the scheme would attempt to store an item. This list of possible buckets is called a "probe sequence". The item is stored in the first empty bucket in the above sequence. It is obvious that in order for this strategy to work, the number of itms n must never exceeds the size of the hash table (notice that this is not a requirement in the chaining method. Another desirable property is that the sequence A_0(k),A_1(k),A_2(k)...A_{m-1}(k) will cover all buckets. With this propery one can guarantee that whenever the hash table is not full, and key will be able to be inserted.

In Linear Probing the rule is simply A_i(k) = (h(k) + i) mod m for i = 0,1,2,... In other words, at first the item is attempted to be stored in bucket h(k), and if this is not possible it is going to the closest bucket moving right in a wraparound fashion. Linear probing presents a serious problem: long contigous full buckets start to emerge. This is due to the fact that an attempt to store in a full bucket i will always be follows by an attempt to store in bucket 9, regardless of hatthe key of the item is.

When we instead move to non-linear probing , say, full bucket 8 will imply that the next bucket will be 9, regardles of the key.

nonlinear-probing do not suffer this problem. For example quadratic probing defined as A_i(k) = (h(k) + i^2 ) mod m for i = 0,1,2,... We saw that this choice is problematic since the sequence of buckets will be roughly half of m, which means that a half-full table may not allow for some insertions.

In tutorial and in assignmnent we saw some variants of linear and quadratic probing.

Other open-addressing adressing include double hashing where he probe sequence is A_i = (h(k) + j * h_2(k)) mod nm for j = 0,1,2,... Notice that it is critical that h_2(k) is never 0.

Average case complexity. We will assume the reasonable "simple uniform hashing" assumption. This assumption says that a random key in the universe is equally likely to hash to any bucket, i.e., Pr[h(x)=i] = 1/m for i = 0, 1, ..., m-1. (Notice that this is also equivalent to saying that each bucket can me mapped into from the same number of universe elements.

Another important concept is the "load factor" alpha = n/m. This is simply the expected number of items in each bucket. Obviously for open addressing we must have alpha <= 1.

Averag case complexity for chaining is analyzed in the following model: The table contains n items with no assumption about how they are distibuted in buckets, and a random key from the universe is inserted. Observe: since typically N is much bigger than n, this means that the search is typically unsuccessful and the key should be added to the table. It is easy to see, that under the simple-uniform-hashing assumption, the average time to search here is O(1+alpha). So as long is alpha is constant (1/10, for example, or 8), we have constant running time on average.

It is reasonable to look at a different setting, in which we pick a random key from the n that are already in the table, that is to perform a random successful search. Here the analysis is different, as longer chains are more likely to be searched, simply since they contain more keys. If L_i is the length of the chain in the i'th bucket, what we get is sum_i( L_i^2)/2m +1/2. If the L_i are distributed in a very nonuniform manner, this may be much more than. If for example, only B buckets are nonempty then it can be seen that average time is at least n/B = alpha * m/B (can you see why?). If, however, we consider the simple-uniform-hashing assumption, the L_i are roughly the same, and we get roughly alpha average running time again.

For open-addressing, the model we choose is the following this choice has to do with what can be reaonably analyzed...): The randomness is applied to the choice of Hashing function and probing mechanism. Specifically, the probe sequence for a key is equally likely to be any one of the m! permutations of {0,1,...,m-1}. With this in mind, the question of how long it will take before an empty bucket is reached is a fairly simple probabilistic question. It is not hard to show that the running time we get is 1/(1-alpha). Notice that we see that not only does alpha needs to be at most 1, but also that it should be far enough from one so that constant running time can be obtained. If alpha=0.99, for example, the running time behaves similraly to 100.
Disjoint Sets. (reading: section 4.2)

The disjint set ADT has a collection of sets as objects. These sets are nonempty and disjoint. A set is identified by a unique element that acts as its representationve. The operations this ADT supports are

MakeSet(x) which creates a set contining only element x (which is new element)
FindSet(x) which returns the representative element of the set contining x (or NIL if x is not an element of any set in the collection).
Union (x,y) which replaces the sets containing elements x and element y with the union of these sets. If x and y are not existing elements or if they belong to the same set, it does nothing.

One common application of th Disjoint Sets ADT is finding connected components in a graph. The Connected componenes of a graph is the a collectin of disjoint and nonempty sets of vertices that together cover the set of vertices of the graph, and vertices x,y are in the same set if and only if they are connected (through a path). To create the components we start by performing MakeSet(v) to all vertices v of the graph. Then, for all edges xy we perform Union(x,y). It is easy to show that the obtained set are the connected componenets.

The setup we consider when talking about running times, is that we have m operations, n of them makeset operations. Notice that this means that there are n elements in the union of all sets at the end of this sequence.

data-structres.I: circular list each set is represented as a circular list, with a special flg representing the rpresentative element. Make set creates a list containing one element pointing to itself. FindSet(x) will follow the pointers from x until reach the representative element. Union(x,y) will first perform FindSet(x) and FindSet(y) and if they are different perform the onion of the chains, by replacing the pointers of x and y (if x points to a and y points to b, x will point to b and y will point to a). In addition it will unflag one of the represnetative, say alwys the one of the second argument, y.

the running time for MakeSet is O(1), for Findset is O(size of set) and for Union is O(1) for joining of the lists, but in addition 2 applications of FindSet must be performed. Since the size of a set can be Theta(n) this is the cost of the running time for the operatons FindSet and Union.

We often look at amortized running time. Here we consider the cost of the average operation in a sequence. Specifically, we look at the worst-case over all possible sequences and divide to the length of the seequence. Clearly, amortized time is no more than usual worst-case time for a operation, since the average of numbers is never more than their maximum. Amortized analysis is usefu since sometime some operation may be very costly, but no sequence will have many costly operations consistently.

The circular-list data structure has bad running time even in the amortized sense. For example, consider n make set operations of elements 1,2,...,n. Then perform union of (1,2) (1,3)...(1,n). Then, for the rest of the operations perform FindSet for the element pointed to by the representative element. For concreteness, we may set m = 2(2n-1) and so half the operations will be the FindSet operations, and each will make n-1, and so the average will be Theta(n) = Theta(m).

II: linked list with pointer to front each set is represented as a linked list, with the representative element at the head of the list. All elements In addition, we have a pointer to the tail of the list. MakeSet is still very simple and obviously takes O(1) time. FindSet simply follows the "back-pointer" to the head of the list, hence trivially takes O(1).

For union, we use FindSet and then if the sets are different create the list which is the union of the two lists. This, using the tail of one of the list, takes only O(1). But we also have to change the back-pointer of the list we appended. This takes as long as the size of that list. In total worst-case running time is O(n). Even amortized running time is bad as we can imagine using elements 1,2,...,n, and then taking union (1,2), union(1,3), union (1,4) where always we append the set of the LEFT argument to the list of the RIGHT argument. It can be showin that for the i'th union we will have to perform i back-pointer changes, so in total we have 1+2+3+...+n-1 = Theta(n^2) which is this case is Theta(m^2). Hence the amortized running time is Theta(m).

Trying to fix the problem with DataStructre II, we come up with datastructe III which is the same as DS II except union are done "by weight" in the sense that smaller lists get appended to larger lists. While the worst case after this correction is still O(n) it can be shown that on a sequence of m operations, n of which make set, the worst-case runing time is O(m + n log n). In other words, the amortized cost can be bounded by O(log n).

Moving onward, to DS IV, we discussed trees to be used to represent sets. Now each set is a tree, so one can think of these data-structures as forests. The root of the tree holds the representative element of the set, and all elements have a pointer to their parent, except the root that has a pointer to itself. Note that no "children" pointer is needed. Makeset is obvious as usual. To implement FindSet we simply follow the parent links until we reach the root whose element we report. For linking two trees in the Union operation we connect the root of one of the trees to the other (note: these are not supposed to be binary trees). The cost of FindSet and Union we be bounded by the height of the trees. But it is easy to show that in the above simple method, it is possible to get trees which are "long and narrow" or even paths, hence the height can be as much as Theta(n), even amortized.

Our goal is therefore to obtain trees with small height. In DS V, "trees with union by weight" We use a similar idea to before, where we choose to always link the smaller tree to the bigger (it is easy to keep a variable holding the size of the tree). This ensures that the height of the trees are logarithmic. leading to O(log n) time for FindSet and Union.

The most advanced data-structures build on these ideas. We first explore the idea of Path-Compression. Here, whenever we follow pointer up to the root (in findSet or Union) we change their parent pointer to point to the root (that, after we finish first pass and actully discovered the root). This clearly does not change the asymptotcis of the operation, but may potentially effect the running time for the future, as a few nodes that used to be far from the root re now children of the root. It is possible to prove that the worst-case running time of a single operation in a sequence, if there are n MAKE-SET operations (so at most n-1 UNIONs) and `f' FIND-SET operations is Theta( f log n / log(1+f/n) ) if f >= n and Theta( n + f log n) if f < n

To conclude this chapter we discuss a DS that uses the Path-Compression and performs "Union By Rank". This means that instead of maintaining the size of the tree and basing link decision on it, we do the same with Rank. Rank is the same as height if no Path Compression operations applied. Specifically, to link two roots with different ranks r>s, we connect the root with rank s to the root with rank r, and keep the rank of the obtined tree r. If both roots have the same rank r, we connect one to the other and the rank of the obtained tree is defined to be r+1. It can be shown that the rank is always at least as large as height (rank is not updated during path compression). The running time we get with this data-structre is mlog*n for a sequence of length m, n of which FindSet. Log*n is a very slowly growing function, and is the number of times "log_2" can be applied to a number repeatedly until it is <= 1.
Graphs (Reading: Chapter 6)

A graph G=(V,E) consists of a set of "vertices" (or "nodes") V and a set of "edges" (or "arcs") E. In a "directed" graph, each edge is a pair of nodes (u,v), and the pair (u,v) is considered different from the pair (v,u) (we will typically not allow self-loops (edges of the form (u,u)). In an "undirected" graph, each edge is a set of two vertices {u,v} (so {u,v} and {v,u} are the same). A "weighted" graph is either directed or undirected, and each edge e in E is assigned a real number w(e) called its "weight" (or sometimes "cost").

Operations on graphs:

Add/Remove a vertex/edge.
Edge-query: Is there an edge between two vertices?
What is the neighbourhood of a vertex u. We use the following defs: N(u) = {v: {u,v} is an edge} in the undirected case, and N_in(u) = {v: (v,u) is an edge}, N_out(u) = {v: (u,v) is an edge} in the directed case.
degrees of vertices. Simply size of the above neihbourhoods.
Traversal: visit each vertex of a graph to perform some task.
Some more useful definitions: a simple path between vertices u and v is a sequence of distinct vertices v_0,v_1,..v_r such that v_0 = u, v_r = v, and for i=1,...,r, we have {v_{r-1},v_r} is an edge (undirected sense) or (v_{r-1},v_r) is an edge (directed sense). The length of the path is r (number of edges in it). The distance between to vertices is the length of the shortest path between them if such a path exists, and infinity otherise.