## An easy way to find redundant edges in a DAG

Dependency graphs are quite useful to visualize a series of jobs that need to run in a specific order. When creating such a graph, you typically specify the logical dependencies of jobs. For example that job A has to run before jobs B and C.

A simple example for this is the following graph, which has only one more node. The new node has logical dependencies to all other nodes though. Now the logical dependency of node D to node A is redundant, due to the additional dependencies from nodes B and C. Redundant in this case means that even if the edge did not exist, all other jobs would still need to run before the job from node D can run.

^{[1]}of the graph would not change, and removing the edge from A to D would yield the transitive reduction

^{[2]}of the graph. Practically it means the edge is probably just adding noise to the graph, instead of helping to visualize the computational problem. Finding these redundant edges can be done in two simple steps:

- Compute all longest path from any node to any other node in the DAG.
- If an edge exists that is not the longest path between these nodes, the edge is redundant.

For the example above, the graph can be represented as a list of edges:

```
edges = [
['A', 'B'],
['A', 'C'],
['A', 'D'],
['B', 'D'],
['C', 'D']
]
nodes = set(itertools.chain.from_iterable(edges))
```

Then the longest paths within the DAG can be computed with the
Floyd-Warshall^{[3]}algorithm, by setting all edge weights to minus one:

```
distances = defaultdict(lambda: defaultdict(lambda: float('infinity')))
for a, b in edges:
distances[a][b]= -1
for node in nodes:
distances[node][node] = 0
for x, y, z in product(nodes, repeat=3):
if distances[y][z] > distances[y][x] + distances[x][z]:
distances[y][z] = distances[y][x] + distances[x][z]
```

And then a lookup for each edge shows the redundant ones:

```
for a, b in edges:
if distances[a][b] != -1:
print('Redundant edge: (%s -> %s)' % (a, b))
```

The full example script can be found on
GitHub^{[4]}.

Some additional notes:

- There are faster ways to compute the redundant edges than using the Floyd-Warshall algorithm. But if the problem is about graphs that are small enough to display them on a screen, this likely won't matter.
- The main reason mentioned here for finding these redundant edges is to remove clutter from the visualization of the graph. Actually removing the edge from the logical dependencies may not be a good idea in this case, since it may (unnoticed) become important again if another edge or node is removed.
- The Graphiz
tool
`tred`

^{[5]}works nicely to filter out such redundant edges directly from dot files.

References: