=========================================================================== CSC B63 Lecture Summary for Week 9 Summer 2008 =========================================================================== [[Q: denotes a question that you should think about and that will be answered during lecture. ]] ------ Graphs [ Appendix B.4 ] ------ * A graph G = (V,E) consists of a set of "vertices" (or "nodes") V and a set of "edges" (or "arcs") E. - In a "directed" graph, each edge is a pair of nodes (u,v), and the pair (u,v) is considered different from the pair (v,u); also, self-loops (edges of the form (u,u)) are allowed. - In an "undirected" graph, each edge is a set of two vertices {u,v} (so {u,v} and {v,u} are the same), and self-loops are disallowed. - A "weighted" graph is either directed or undirected, and each edge e in E is assigned a real number w(e) called its "weight" (or sometimes "cost"). * Standard operations on graphs: - Add a vertex; Remove a vertex; Add an edge; Remove an edge. - Edge Query: given two vertices u,v, find out if the directed edge (u,v) or the undirected edge {u,v} is in the graph. - Neighbourhood: given a vertex u in an undirected graph, get the set of vertices v such that {u,v} is an edge (denoted N(u) or Nbr(u)). - In-neighbourhood, out-neighbourhood: given a vertex u in a directed graph, get the set of vertices v such that (u,v) (or (v,u), respectively) is an edge (denoted N_in(u) and N_out(u), respectively). - Degree, in-degree, out-degree: compute the size of the neighbourhood, in-neighbourhood, or out-neighbourhood, respectively (denoted deg(u), deg_in(u) and deg_out(u)). - Traversal: visit each vertex of a graph (in a particular order) to perform some task. - etc. [[Q: What are the standard data structures for graphs? ]] --------------------------------------- Data structures for representing graphs [ Section 22.1 ] --------------------------------------- Drawing a pretty picture of a graph works well for humans, but computers aren't so happy about a pictorial representation. We now discuss three ways to represent a graph in a computer. [[Q: Study the trade-offs between using each of these representations. What is the space requirements for each representation? What is the time requirement of each "common" operation for each? Can you solve different problems more efficiently with different representations? ]] * adjacency-list representation - array of n elements Adj[], each entry indexed by a vertex of G - Adj[v] is a list of adjacent vertices (either linked list or array) - works well for representing undirected or undirected graphs - if weighted graph, store weights in Adj[v] along with vertex (i.e., for edge (v,u) with weight w, store pair [u, w] in Adj[v] list) - there are exactly 2m (undirected graph) or m (directed graph) entries over all adjacency lists -> efficient space storage * adjacency-matrix representation - use n x n matrix A, where entry A[u,v] = 1 if (u,v) is an edge, A[u,v] = 0 if (u,v) is not an edge - efficient edge query operation, but uses more memory - works well for representing undirected or undirected graphs - for weighted graph, store weights in A i.e., if all weights are nonzero, A[u,v] = 0 means edge (u,v) not in graph, otherwise, A[u,v] = w(u,v) - for undirected graph, notice that upper and lower triangles are mirror images (since (u,v) edge iff (v,u) edge) -> need only store the matrix portion of main diagonal and above, reducing memory usage nearly by half - additional storage win for unweighted graphs: only need a single bit to store whether (u,v) is an edge instead of an entire word of memory -> only makes difference in constant factor of asymptotic notation * edge-list representation - store a list of edges (i.e., linked list, array of m entries, etc.) - efficient memory storage - works well for representing undirected or undirected graphs - for weighted graphs, store weight along with the edge in list [[Q: Can you come up with other useful representations that permit some of the "standard operations" to be answered in constant time? ]] -------------------- Breadth-First Search [ Section 22.2 ] -------------------- * Starting from a specified "source" vertex s in V, BFS visits every vertex v in G that can be reached from s, and in the process, constructs for each v a path from v to s with the smallest number of edges (a "BFS-tree" of the graph). BFS works on directed or undirected graphs: we describe it for directed graphs. * To keep track of progress, each vertex is given a "colour", which is initially white. The first time that a vertex is encountered, its colour is changed to gray, and once a vertex has been examined (we'll see what the difference is between "encountered" and "examined" in a second), its colour is changed to black. At the same time, for each vertex v, we also keep track of the predecessor (the parent) of v in the BFS tree, p[v], and we keep track of the "distance" of v to s (the number of edges from s to v), d[v]. * Intuitively, white vertices are "unknown" to BFS, black vertices are "known" and have been fully "explored" (i.e., BFS has encountered all their neighbours), and gray vertices are known but not fully explored: they represent the "frontier". The distinction between black and gray vertices is important: it's how BFS keeps track of which vertices to explore next so that it really is working in a "breadth-first" manner. In order to manage the gray vertices, BFS stores them in a queue so that they are dealt with in a first-in, first-out manner. BFS(G=(V,E),s) for all vertices v in V colour[v] := white d[v] := infinity p[v] := NIL end for initialize an empty queue Q colour[s] := gray d[s] := 0 p[s] := NIL ENQUEUE(Q,s) while Q is not empty do u := DEQUEUE(Q) for each edge (u,v) in E do if colour[v] == white then colour[v] := gray d[v] := d[u] + 1 p[v] := u ENQUEUE(Q,v) end if end for colour[u] := black end while END BFS * Look at the example in the textbook. * Each node is enqueued at most once, since a node is enqueued only when it is white, and its colour is changed the first time it is enqueued. In particular, this means that the adjacency list of each node is examined at most once, so that the total running time of BFS is O(n+m), linear in the size of the adjacency list. * We can show that at the end of BFS, d[v] is equal to the number of edges on a shortest path from s to v (i.e., a path with the smallest number of edges). -> proof is outlined in "BFS computes shortest-path proof" handout * Applications: - Computing single-source shortest paths / distance in an unweighted graph (i.e., finding a shortest path through a maze). - Discovering connected components in a graph. - Identifying bipartite graphs, finding a 2-colouring of a graph. - Used for traversing decision trees in artificial intelligence.