Visualizing timeline logs
A lot of applications produce logs that basically contain a series of events together with their start and finish time as in the following example table.
Event name | Start time | Finish time |
---|---|---|
A | 0 | 10 |
B | 5 | 20 |
C | 10 | 30 |
D | 30 | 50 |
E | 30 | 35 |
F | 30 | 40 |
Even from this small example table it is not immediately obvious which events were overlapping in time. A good way to see this easier, is to create a simple Gantt chart[1] for the events:
Now it's easily visible that event B was overlapping with A and C, while events D, E and F all started at the same time once C is finished.
The chart still has a drawback though - it uses a new lane for each event. A more dense representation of the same events would only need half as many lanes, as can be seen in the following chart:
So in this more dense representation, events A, C and D as well as B and E share a lane, since they are not overlapping. This makes it much easier to spot (possible) dependencies and critical paths between the different events.
For such a dense chart the assigned lanes for each event need to be computed though. If the goal is simply to use as few different lanes as possible, the representation is not necessarily unique. For example the previous chart could arbitrarily change the lane assignments of events D, E and F while still using only three different lanes. Defining a sort order for the events by i.e. duration or name can make the assignment deterministic though.
The following code shows one (not entirely deterministic) example how to assign lanes to events to create a chart as above. Assuming the log is available in the following format:
log = [
{'event-name': 'A', 'event-start': 0, 'event-finish': 10},
{'event-name': 'B', 'event-start': 5, 'event-finish': 20},
{'event-name': 'C', 'event-start': 10, 'event-finish': 30},
{'event-name': 'D', 'event-start': 30, 'event-finish': 50},
{'event-name': 'E', 'event-start': 30, 'event-finish': 35},
{'event-name': 'F', 'event-start': 30, 'event-finish': 40},
]
# In case a event starts at the same time as another finishes, we make
# sure that we first free the lane so it can be reassigned.
EVENT_FINISH, EVENT_START = 0, 1
num_lanes = 0 # Number of lanes needed so far.
free_lanes = [] # Set of currently unassigned lanes.
event_lanes = {} # Which event gets assigned to which lane.
The events are added to a heap, sorted such that the event
starting first will be the first to be removed from the
heap:
import heapq
# Create start events for each log entry, sorted by the event start
# timestamp and type.
events = []
for l in log:
heapq.heappush(
events,
(l['event-start'], EVENT_START, l['event-name'], l['event-finish'])
)
To compute lanes for the events, the first event is then
removed from the heap and assigned to one of the currently
available lanes. The just assigned lane will be removed from
the pool of available lanes. To make sure the lane becomes
available again once the event finishes, a
new EVENT_FINISH
event is inserted into the
heap. This event contains the timestamp when the event
finishes and the additional information of which lane it was
assigned to. When the next event of the heap is one of these
finish events, the assigned lane can be added to the set of
available ones again.
# Now iterate through all events. If it is a event start event, we
# assign a free lane and insert a finish event. If it is a stop event,
# we add the lane used by the event to the pool again.
while events:
e = heapq.heappop(events)
# Event start case.
if e[1] == EVENT_START:
# If there is no free lane, increase the number of lanes.
if free_lanes == []:
num_lanes += 1
free_lanes.append(num_lanes)
# Assign free lane to event
lane = free_lanes.pop()
event_lanes[e[2]] = lane
# Insert event finish event.
heapq.heappush(
events,
(e[3], EVENT_FINISH, lane)
)
# Event finish case.
else:
# Free lane and add to the pool.
free_lanes.append(e[2])
Once all events from the heap are removed,
the event_lanes
dictionary contains an assigned
lane for each event, which can be used to create the more
dense chart.
The complete example script can be found on
GitHub[2]. Runtime is due to
the heap operations at O(n log n)
, with n being
the number of log entries. Assuming that events either need to
be sorted, or lanes must be queried for available intervals,
this should be optimal. And considering that the chart is
meant to be displayable, the number of events should be fairly
small anyway.
References: