================================= WEEK 7 ==================================== Basic graph-theoretic definitions --------------------------------- * A *graph* G = (V,E) consists of: - a set V of *vertices*, and - a set E of *edges*. By convention, we let n = |V| be the number of vertices and m = |E| be the number of edges in a graph. * In a *directed* graph, each edge has a direction (it is an ordered pair), so the edge (u,v) is distinct from the edge (v,u), and "self-loops" (an edge of the form (v,v)) are allowed; in an *undirected* graph, each edge is a set, so {u,v} and {v,u} are the same and self-loops are not allowed. To simplify the notation, we will use (u,v) to denote edges even when they are not directed. * For the rest of this section, we will talk about *undirected* graphs unless we explicitly say otherwise, i.e., if we want to talk about directed graphs, we will always specify it explicitly. * Two vertices u and v are *adjacent* if the edge (u,v) belongs to E. Similarly, two edges e_1 and e_2 are *adjacent* if they have one vertex in common. A vertex v and an edge e are adjacent if v is one of the vertices of e. * The *degree* of a vertex v is the number of edges adjacent to v. For directed graphs, the *in-degree* is the number of edges coming into v and the *out-degree* is the number of edges coming out of v. * A *path* in G between two vertices v_0 and v_k is a sequence of edges (v_0,v_1), (v_1,v_2), ..., (v_{k-1},v_k) such that (v_{i-1},v_i) is an edge of E for i = 1,...,k-1. The *length* of a path is the number of *edges* on it. A path is *simple* if all the vertices on the path are distinct (which implies that the edges are all distinct also). * A *circuit* of G is a closed path (with the same first and last vertices) that contains no repeated edge. A *cycle* is a circuit that contains no repeated vertex (except for the first and last). The *size* of a cycle is the number of vertices in it. * A graph G is *connected* if there exists at least one path between every pair of vertices. * A graph G is *acyclic* if it does not contain any cycle. Graph representations and data structures ----------------------------------------- * Two standard representations: adjacency matrix, and adjacency lists. By convention, vertices are usually represented as index numbers from 0 to n-1 (but not always). In that case, an adjacency matrix is simply an [n x n] array whose (i,j)-th entry (row i, column j) indicates whether the edge (i,j) is part of the graph or not. Adjacency lists are used most often for *sparse* graphs (without many edges), for which storing an entire adjacency matrix would be wasteful: for each vertex i in the graph, we store a list (as an array, or a linked list) of the vertices to which i is adjacent. * For example, here is a pictorial repesentation of a directed graph: B G _ _ _/ \_ _/ | \_ _/ \_ _/ | \_ _/ \_ _/ | \_ _/ \/ _/ |/ \/ A -------> C <------- E -------> H -------> I _ \_ _/ |\ _/ \_ /| \_ _/ | _/ \_ | \_ _/ | _/ \_ | \/ _/ | \/ \/ | D <------- F <------------------ J The above graph can be represented by the following adjacency matrix: A B C D E F G H I J ------------------- A| x x x B| x C| D| x E| x x x F| x x G| x x H| x x x I| J| x x Here is the above graph in the "array of linked lists" form of an adjacency list and in the "array or arrays" form of adjacency lists: _ _ _ _ _ A|_|-> B -> C -> D -> nil A|_|-> |B|C|D| B|_|-> E -> nil B|_|-> |E| C|_|-> nil C|_|-> _ D|_|-> E -> nil D|_|-> |E|_ _ E|_|-> C -> G -> H -> nil E|_|-> |C|G|H| F|_|-> D -> E -> nil F|_|-> |D|E| G|_|-> H -> I -> nil G|_|-> |H|I|_ H|_|-> F -> I -> J -> nil H|_|-> |F|I|J| I|_|-> nil I|_|-> _ _ J|_|-> F -> I -> nil J|_|-> |F|I| Depth-First Search (DFS) ------------------------ * Idea is to travel along edges "as far as possible" before backtracking. For example, in the directed graph above, the vertices could be traversed in the indicated order: B G _(9) _(7) _/ \_ _/ | \_ _/ \_ _/ | \_ _/ \_ _/ | \_ _/ \/ _/ |/ \/ A -------> C <------- E -------> H -------> I (0) (8) _(2) (3) (5) \_ _/ |\ _/ \_ /| \_ _/ | _/ \_ | \_ _/ | _/ \_ | \/ _/ | \/ \/ | (1) (6) (4) D <------- F <------------------ J * What information do we need to keep track of in order to do this properly? We need to know which vertices have been visited and which ones haven't, to make sure that we don't get into a cycle. In order to do backtracking properly, we also need to remember the sequence of vertices we've taken to get where we are, i.e., the "current path", and for each vertex on this path, we need to remember which edge to visit next. * What kind of data structure would be appropriate for doing all this? First of all, we can use a simple array of booleans to know which vertices have been visited: initially, this array is set to all "false", and the values are updated as we visit new vertices. How do we keep track of which vertex to visit next? Use a stack, and when we visit a new vertex (such as when we start), simply push onto the stack every nonvisited neighbour of the current vertex. Then, simply keep popping the first vertex and "visiting" it if it hasn't been visited yet until every vertex has been visited, at which point the stack will be empty. * Here's the algorithm: DFS(G,v) ( v is the vertex where the search starts ) Stack S := {}; ( start with an empty stack ) for each vertex u, set visited[u] := false; push S, v; while (S is not empty) do u := pop S; if (not visited[u]) then visited[u] := true; for each unvisited neighbour w of u push S, w; end if end while END DFS() * Note that this algorithm does *not* currently keep track of the edges that were used to visit new vertices. To do that, we can simply use another array predecessor[u] where we store, for each vertex u, the vertex that u was visited from. More precisely, set visited[u] to `nil' initially, and every time that we push a vertex w onto the stack S, while currently "visiting" vertex u, update predecessor[w] to be equal to u. * Note that if there is more than one connected component in G (i.e., different "subgraphs" of G that have no edge between them), it is possible that DFS as given above will not visit every vertex in G. If our goal is to traverse the entire graph G, then we can change DFS to "restart" on an unvisited vertex after the main loop is done, if there are unvisited vertices left, and to keep doing this until the whole graph is visited (see the readings for details). Breadth-First Search (BFS) -------------------------- * Idea is to travel along edges "as soon as possible" before going on. For example, in the directed graph above, the vertices would be traversed in the indicated order: B G _(1) _(5) _/ \_ _/ | \_ _/ \_ _/ | \_ _/ \_ _/ | \_ _/ \/ _/ |/ \/ A -------> C <------- E -------> H -------> I (0) (2) _(4) (6) (7) \_ _/ |\ _/ \_ /| \_ _/ | _/ \_ | \_ _/ | _/ \_ | \/ _/ | \/ \/ | (3) (8) (9) D <------- F <------------------ J * What information do we need to keep track of in order to do this properly? We need to know which vertices to visit next, and the best data structure for the kind of list we need is a queue. We also need to keep track of which vertices have already been put in the list so that we don't put them in twice, something we can easily do with a boolean array (just like in DFS). * Here's the algorithm (notice the similarity to DFS): BFS(G,v) ( v is the vertex where the search starts ) Queue Q := {}; ( start with an empty queue ) for each vertex u, set visited[u] := false; enqueue Q, v; while (Q is not empty) do u := dequeue Q; if (not visited[u]) do visited[u] := true; for each unvisited neighbour w of u enqueue Q, w; end if end while END BFS() * Note that just like DFS, this algorithm does *not* currently keep track of the edges that were used to visit new vertices. To do that, we can simply use another array predecessor[u] just like for DFS. * Note that if there is more than one connected component in G, the algorithm can be modified similarly to DFS so that it visits every vertex in the graph. Traversal trees --------------- * In DFS and BFS, if we keep track of every edge that led to an unvisited vertex during the search, we end up with what's called the "traversal tree" for that search. It's a set of edges that connects the starting vertex with every vertex that can be reached from there. The traversal tree is also called "predecessor tree". (Example from the graph above, showing that the traversal tree is usually different for DFS and BFS.) * In fact, both searches impose additional structure on the graph G. For example, during DFS, a vertex can be in one of three possible states: "undiscovered" (when the vertex has not been reached at all), "discovered" but not "explored" (when the vertex has been reached but not all edges leading out of the vertex have been examined), and "explored" (when all edges from the vertex have been examined and we are done with the vertex). Similarly, during BFS, a vertex can be in any one of the same three states ("discovered" when it's been put in the queue and "explored" when it's been taken out of the queue).