Graph Traversals: Depth-First, Breadth-First...

A graph traversal is the process of systematically visiting each node in a graph. This can be tricky because many graphs have cycles (loops).

The two most common graph traversal algorithms are breadth-first search (BFS) and depth-first search (DFS). Both of these traversal algorithms can be viewed as refinements of the tricolor algorithm, ones in which choices are made in a way that leads to their differing behavior.

The Tricolor Algorithm

(Heavily from https://andrewcmyers.github.io/oodds/lecture.html?id=traversals)

Abstractly, graph traversal can be expressed in terms of the tricolor algorithm due to Dijkstra and others. In this algorithm, graph nodes are assigned one of three colors that can change over time:

The progress of the algorithm is depicted by... . Initially there are no black nodes and the root is gray. As the algorithm progresses, white nodes turn into gray nodes and gray nodes turn into black nodes. Eventually there are no gray nodes left and the algorithm is done. All remaining nodes are either white or black, with the black nodes consisting of exactly the nodes reachable from the roots.

roots initial state undiscovered frontier finished during traversal unreachable finished reachable after traversal

The algorithm maintains a key two-part invariant, called the black–white invariant or tricolor invariant:

  1. The root is not white.
  2. There is no edge from a black node to a white node.

Because there is no edge from a black node to a white node, any edge exiting a black node goes either to a black node or to a gray frontier node.

This invariant clearly holds initially, and because it is true at the end, we know that any remaining white nodes cannot be reached from the black nodes.

The algorithm pseudo-code is as follows:

This algorithm is abstract enough to describe many different graph traversals. It allows the particular implementation to choose the node n from among the gray nodes; it allows choosing which and how many white successors to color gray; and it allows delaying the coloring of gray nodes black. We say that such an algorithm is nondeterministic because its next step is not uniquely defined. However, as long as it does some work on each gray node that it picks, any implementation that can be described in terms of this algorithm will finish. Moreover, because the black–white invariant is maintained, it must reach all reachable nodes in the graph.

One advantage of defining graph search in terms of the tricolor algorithm is that the tricolor algorithm works even when gray nodes are processed concurrently, as long as the black–white invariant is maintained. The invariant ensures that whatever graph traversal we choose, the algorithm will work even when different gray nodes are handled concurrently.

Relying on a distance Field

(Heavily from https://andrewcmyers.github.io/oodds/lecture.html?id=traversals)

We can compute distances explicitly during the traversal by replacing the colors with a distance field in each vertex, as in:

frontier = new Queue();
frontier.push(root);
root.distance = 0;
for (v != root)
  v.distance = ∞;

while (frontier not empty) {
  g = frontier.pop();
  foreach (g → v) {
    if (v.distance == ∞) {
      frontier.push(v);
      v.distance = g.distance + 1;
    }
  } // foreach
}

Here the white nodes are those whose distance field is ∞, the gray nodes are those whose distance field is finite and that are in frontier, and the black nodes are those whose distance field is finite and that are not in frontier. When a new white node is discovered, its distance is set to a finite value one greater than its predecessor, and it is pushed onto the queue; this corresponds to turning the node from white to gray. All nodes on the queue are gray.

When a node v is popped off the queue,

When done, v changes from gray to black.

In breadth-first search frontier is a first-in, first-out (FIFO) queue. The final distance value of a node is the length of the minimum path from a root to that node. All the nodes on the queue have distance within one of each other.

In breadth-first search, there is a set of nodes to be popped off, at some distance k from the source, and another set of elements, later on the queue, at distance k+1. Every time a new node is pushed onto the queue, it is at distance k+1 until all the nodes at distance k are gone, and k then increases by one.