Software Design
Lectures
Graphs

Starting Point

Have a list of rules with:

Target

One or more dependencies

Which may be targets as well

Action(s) to bring target up to date

Zero or more targets may be out of date

I.e., older than their dependencies

Goal: execute required actions

Must respect dependency order

Do not update something until all dependencies are up to date

Must only update things once

Efficiency: wasteful to recompile something eleven times

Correctness: actions may have side effects

Example

Can update D and E in any order

Can't update B until both D and E have been updated

[Dependency Ordering]

Graphs

A graph is a set of nodes connected by arcs

Directed graph if the arcs have direction

Undirected graph if the arcs simply show connections

There are a lot of graphs in the world

Bus routes

Transitions in a finite state machine

Project dependencies

One of the fundamental data structures in computing

Kind of odd that java.util.Graph doesn't exist

Design Choices

Three basic representations

Graph owns nodes and arcs

Graph owns nodes, nodes own arcs

Graph owns arcs, arcs own nodes

Each one makes some algorithms cheap, and others expensive

We will implement directed graphs using the second one option

Graph is a map

Keys are names of nodes

Values are sets of reachable nodes

I.e., the other ends of the arcs out of the node identified by the key

Note: make parameters and return values as general as possible

Use Collection instead of List, Set instead of HashSet, and so on

Makes code more flexible (fewer constraints on users)

Makes code easier to change (fewer external commitments)

Example

[Graph Representations Diagram]

Class Skeleton

class DirectedGraph {
  public         DirectedGraph() {...}

  public Set     getNodes() {...}
  public boolean hasNode(Object node) {...}
  public void    addNode(Object node) {...}
  public void    addNode(Object node, Collection arcs) {...}
  public void    removeNode(Object node) {...}

  public Set     getArcs(Object node) {...}
  public boolean hasArc(Object src, Object dst) {...}
  public void    addArc(Object src, Object dst) {...}
  public void    addArcs(Object src, Collection allDst) {...}
  public void    removeArc(Object src, Object dst) {...}

  public String  toString() {...}

  protected Map fNodes;
}

More Design Choices

What happens if we try to add a node that's already present?

Or duplicate an arc?

Or remove a node or arc that doesn't exist?

Could throw an exception

Means that users have to write code like "if not present then add" and "If present then delete"

Could just ignore operations that don't make sense

Results in late failures

Still More Design Choices

What should a graph return when asked for its nodes?

Return map's keys as a set

What if it is asked for a node's arcs?

The set used to store the arcs?

Efficient, but allows users to mess up data structure's internals

A copy of that set?

Less efficient for large graphs with many arcs

But safer

What if a user wants all arcs?

Not directly available, but easy to construct

A set of two-element arrays?

A list of two-element lists?

Implementation

public DirectedGraph() {
    fNodes = new HashMap();
}

public Set getNodes() {
    return fNodes.keySet();
}

public boolean hasNode(Object node) {
    return fNodes.containsKey(node);
}

public void addNode(Object node) {
    addEmptyNode(node);
}

public void addNode(Object node, Collection arcs) {
    addEmptyNode(node);
    addArcs(node, arcs);
}

protected Map       fNodes;

Utility Methods

Have to use a concrete set class for storing arcs

If we ever want to change how this is done, only want to have to change it in one place

Also need to handle nodes that already exist

protected void addEmptyNode(Object node) {
    if (!fNodes.containsKey(node)) {
        fNodes.put(node, new HashSet());
    }
}

Removing Nodes

Must ensure integrity of data structure

When removing a node, must also remove all arcs that point to it

Rely on being able to "remove" nonexistent arcs

public void removeNode(Object node) {
    fNodes.remove(node);
    Iterator ni = getNodes().iterator();
    while (ni.hasNext()) {
        removeArc(ni.next(), node);
    }
}

Adding Arcs

When adding an arc from A to B, must ensure that A and B exist

Data structure integrity again

Note: work with Set interface, not HashSet class

public void addArc(Object src, Object dst) {
    addNode(src);
    addNode(dst);
    Set arcSet = (Set)fNodes.get(src);
    arcSet.add(dst);
}

Other methods are straightforward

See exmpl/graph/DirectedGraph.java

Reading Graphs

Write a parser to read graphs from files

Should be easy after the Makefile parser

Graph file format:

Blank lines and comments

Data lines containing:

Node name

Colon

Space-separated list of reachable nodes

Note: nodes may be implied

I.e., node may be listed as reachable, but not have its own entry

# Example graph file
a : b c
b : c

Graph Algorithms

Crop up everywhere in computing

Because many problems best represented as graphs

Most are recursive

Must handle circularity

A->B, B->C, C->A

Solution is to keep track of nodes already visited using a set

If node being examined is already in the set, don't recurse

Example: Reachability

public static boolean reachable(DirectedGraph graph, Object src, Object dst, Set seen) {
    // Are we there yet?
    if (src == dst) {
        return true;
    }
    
    // Try to get there indirectly
    Iterator ia = graph.getArcs(src).iterator();
    while (ia.hasNext()) {
        Object next = ia.next();
        if (!seen.contains(next)) {
            seen.add(next);
            if (reachable(graph, next, dst, seen)) {
                return true;
            }
        }
    }
    return false;
}

Reachability in Action: 1

[Reachability Diagram Stage 1]

Reachability in Action: 2

[Reachability Diagram Stage 2]

Reachability in Action: 3

[Reachability Diagram Stage 3]

Reachability in Action: 4

[Reachability Diagram Stage 4]

How to Call

public static void caller() {
    DirectedGraph g = new DirectedGraph();
    ...add nodes to graph...
    boolean b = reachable(graph, "A", "Z", new HashSet());
}

Requiring users to create and pass in a set that is only used internally is clumsy

In practice, probably provide two methods:

public static boolean reachable(DirectedGraph graph, Object src, Object dst) {
    return reachable(graph, src, dst, new HashSet());
}

protected static boolean reachable(DirectedGraph graph, Object src, Object dst, Set seen) {
    ...as before...
}

Example: Shortest Path

Can't just keep adding nodes to seen set

Instead, have to add and remove nodes to keep track of current path

public static List shortest(DirectedGraph graph, Object src, Object dst, Set seen) {
    // Are we there yet?
    if (src == dst) {
        List result = new ArrayList();
        result.add(src);
        return result;
    }
    
    ...Look at lengths of all paths starting from here...
    
    // If a valid path exists, put the starting node on it
    if (best != null) {
        best.add(0, src);
    }
    
    return best;
}

Shortest Path (cont.)

    // Look at lengths of all paths starting from here
    List best = null;
    Iterator ia = graph.getArcs(src).iterator();
    while (ia.hasNext()) {
        Object node = ia.next();
        if (!seen.contains(node)) {
            seen.add(node);
            List path = shortest(graph, node, dst, seen);
            seen.remove(node);
            if (path != null) {
                if ((best == null) || (path.size() < best.size())) {
                    best = path;
                }
            }
        }
    }

Notes on Shortest Path

Set of nodes already seen grows and shrinks

May be several ways to reach a node

Never include more than once in any path

Question: what happens if nodes accidentally left in set?

Note how src==dst case is handled

shortest(a, a) produces [a], not [a,a]

Ensures that paths of length greater than one are constructed correctly

A Note on Testing

Test graph algorithms by checking output for sample input

But looking at output repeatedly is tedious and error-prone

Better solution: build a tool

Read file containing expected output

Read actual output from program

Report whether there are differences

How to get program's output to the comparison program?

Could save to a temporary file

Better to use a Unix pipe

See exmpl/graph/TextDiff.java for comparison tool

Makefile contains lines like these:

JAVA_FLAGS = -classpath ${SCJAVAPATH}
JAVA_RUN   = java ${JAVA_FLAGS} -enableassertions

RUN_ALG    = ${JAVA_RUN} GraphAlgorithms
RUN_DIFF   = ${JAVA_RUN} TextDiff

        ${RUN_ALG} shortest empty.grf | ${RUN_DIFF} shortest_empty.txt

$Id: graph.html,v 1.1.1.1 2004/01/04 05:02:31 reid Exp $