Dec 2018 Rational Exuberance

Visualizing timeline logs

A lot of applications produce logs that basically contain a series of events together with their start and finish time as in the following example table.

Event nameStart timeFinish time

Even from this small example table it is not immediately obvious which events were overlapping in time. A good way to see this easier, is to create a simple Gantt chart[1] for the events:

Now it's easily visible that event B was overlapping with A and C, while events D, E and F all started at the same time once C is finished.

The chart still has a drawback though - it uses a new lane for each event. A more dense representation of the same events would only need half as many lanes, as can be seen in the following chart:

So in this more dense representation, events A, C and D as well as B and E share a lane, since they are not overlapping. This makes it much easier to spot (possible) dependencies and critical paths between the different events.

For such a dense chart the assigned lanes for each event need to be computed though. If the goal is simply to use as few different lanes as possible, the representation is not necessarily unique. For example the previous chart could arbitrarily change the lane assignments of events D, E and F while still using only three different lanes. Defining a sort order for the events by i.e. duration or name can make the assignment deterministic though.

The following code shows one (not entirely deterministic) example how to assign lanes to events to create a chart as above. Assuming the log is available in the following format:

log = [
    {'event-name': 'A', 'event-start': 0,  'event-finish': 10},
    {'event-name': 'B', 'event-start': 5,  'event-finish': 20},
    {'event-name': 'C', 'event-start': 10, 'event-finish': 30},
    {'event-name': 'D', 'event-start': 30, 'event-finish': 50},
    {'event-name': 'E', 'event-start': 30, 'event-finish': 35},
    {'event-name': 'F', 'event-start': 30, 'event-finish': 40},

# In case a event starts at the same time as another finishes, we make
# sure that we first free the lane so it can be reassigned.

num_lanes = 0    # Number of lanes needed so far.
free_lanes = []  # Set of currently unassigned lanes.
event_lanes = {} # Which event gets assigned to which lane.
The events are added to a heap, sorted such that the event starting first will be the first to be removed from the heap:
import heapq

# Create start events for each log entry, sorted by the event start
# timestamp and type.
events = []
for l in log:
        (l['event-start'], EVENT_START, l['event-name'], l['event-finish'])
To compute lanes for the events, the first event is then removed from the heap and assigned to one of the currently available lanes. The just assigned lane will be removed from the pool of available lanes. To make sure the lane becomes available again once the event finishes, a new EVENT_FINISH event is inserted into the heap. This event contains the timestamp when the event finishes and the additional information of which lane it was assigned to. When the next event of the heap is one of these finish events, the assigned lane can be added to the set of available ones again.
# Now iterate through all events. If it is a event start event, we
# assign a free lane and insert a finish event. If it is a stop event,
# we add the lane used by the event to the pool again.
while events:
    e = heapq.heappop(events)

    # Event start case.
    if e[1] == EVENT_START:
        # If there is no free lane, increase the number of lanes.
        if free_lanes == []:
            num_lanes += 1

        # Assign free lane to event
        lane = free_lanes.pop()
        event_lanes[e[2]] = lane

        # Insert event finish event.
            (e[3], EVENT_FINISH, lane)
    # Event finish case.
        # Free lane and add to the pool.

Once all events from the heap are removed, the event_lanes dictionary contains an assigned lane for each event, which can be used to create the more dense chart.

The complete example script can be found on GitHub[2]. Runtime is due to the heap operations at O(n log n), with n being the number of log entries. Assuming that events either need to be sorted, or lanes must be queried for available intervals, this should be optimal. And considering that the chart is meant to be displayable, the number of events should be fairly small anyway.