Tutorial 8: An Introduction to Depth-First Search

The Algorithm

A 'Depth-first search' of a graph is a procedure for iterating over vertices in a graph. When choosing which vertex to explore next, we favour children over siblings (as opposed to a breadth-first search order where siblings are favoured over children).

As was the case with breadth-first search (and any graph traversal) the evolution of a DFS may be described as a process by which every node in the graph is coloured, starting with white, at some point becoming grey when 'activated', and finally becoming black when 'finished'.

To get a feel for this, we need to run some DFSs on some graphs. There are many nice pictures in CLRS, as well as in the the wiki entry on Depth-first search.

While it makes sense to do a DFS on any graph, the applications we'll discuss concern directed graphs, so imagine our underlying graph is directed.

So what's the output of a DFS?

DFS is an algorithm in that it describes a procedure. Many of the algorithms we discuss solve particular problems are understood as having an output (even in the data structure world: the input is the current data-structure state plus some auxiliary arguments for the current operation, the output is the data-structure state after making the modifications). While an algorithm having an output is a natural thing, it isn't a necessary thing, and we won't usually talk about the output of a DFS either. We will see shortly that taking a DFS is an important subroutine of other algorithms (which will have outputs).

(Some readers may have been happy without this discussion. If so, great.)

Graph Coordinates derived from the algorithm

Running a DFS on a graph gives us some quantities and coordinates with which to make sense of graph structure.

Vertex coordinates: Discovery and Finish Times

There are two numbers one may associate with a node $v$: the discovery time $d(v)$ and the finish time $f(v)$. The $d(v)$ is the time(-step) when $v$ is coloured grey, $f(v)$ the time when $v$ is coloured black.

Edge type: Tree, forward, back or cross

The edges can be classified into one of four types:

Tree edges; those edges that were followed by the DFS just before activating a new node

Note that the tree edges, taken together with $V$, form a tree (d'uh). Ok so they form a forest, since we aren't guaranteed that the graph is fully or strongly connected. The edges that are not tree edges are one of three types:

Forward edges: edges that point to an already discovered node that is a descendent in the DFS tree
Back edges: edges that point to one's ancestor in the DFS tree
Cross edges: anything edge that is not classified by one of the previous three categories.

And a proposition about these categories (thm 22.10 from CLRS):

In a depth-first search of an undirected graph $G$, every edge of $G$ is either a tree edge or a back edge.

Properties of activation intervals

The following theorem appears in CLRS:

For $u,v \in V$, the intervals $I_u = [d(u),f(u)]$ and $I_v = [d(v), f(v)]$ are either disjoint, or one is properly contained in the other.

We formulated the following proposition together in tutorial

For $u\in V$, $f(u) - d(u) = 2m+1$ where $m$ is the number of descendents of $u$ in the DFS tree.

We reached for but didnt't give a formal proof. Our best intuition was that the $2m$ comes from that the clock increments twice for each of our $m$ descendents (opening and closing) and once for us.

CSC 265 Fall 2013