Nov 2018 Rational Exuberance

An easy way to find redundant edges in a DAG

Dependency graphs are quite useful to visualize a series of jobs that need to run in a specific order. When creating such a graph, you typically specify the logical dependencies of jobs. For example that job A has to run before jobs B and C.

When the graph becomes more complicated, some of these logical dependencies will often become redundant, i.e. they are not required for determining the order of jobs.

A simple example for this is the following graph, which has only one more node. The new node has logical dependencies to all other nodes though. Now the logical dependency of node D to node A is redundant, due to the additional dependencies from nodes B and C. Redundant in this case means that even if the edge did not exist, all other jobs would still need to run before the job from node D can run.

Technically this means that the topological order[1] of the graph would not change, and removing the edge from A to D would yield the transitive reduction[2] of the graph. Practically it means the edge is probably just adding noise to the graph, instead of helping to visualize the computational problem. Finding these redundant edges can be done in two simple steps:
  1. Compute all longest path from any node to any other node in the DAG.
  2. If an edge exists that is not the longest path between these nodes, the edge is redundant.

For the example above, the graph can be represented as a list of edges:

edges = [
    ['A', 'B'],
    ['A', 'C'],
    ['A', 'D'],
    ['B', 'D'],
    ['C', 'D']
]
nodes = set(itertools.chain.from_iterable(edges))
Then the longest paths within the DAG can be computed with the Floyd-Warshall[3] algorithm, by setting all edge weights to minus one:
distances = defaultdict(lambda: defaultdict(lambda: float('infinity')))

for a, b in edges:
    distances[a][b]= -1

for node in nodes:
    distances[node][node] = 0

for x, y, z in product(nodes, repeat=3):
    if distances[y][z] > distances[y][x] + distances[x][z]:
        distances[y][z] = distances[y][x] + distances[x][z]

And then a lookup for each edge shows the redundant ones:

for a, b in edges:
    if distances[a][b] != -1:
        print('Redundant edge: (%s -> %s)' % (a, b))

The full example script can be found on GitHub[4].

Some additional notes:


References: