A Scatter Plot with Different Time Formats

Suppose you want to create a scatter (XY) plot with time values on each axis, but the columns of time values have different formats; for instance, you might have date values in one column (axis), and time-of-day in the other column. Alas, a frustrating limitation of gnuplot is that it does not (innately) support different time formats (timefmt) for different variables (data columns).

Fortunately, although gnuplot supports only one timefmt setting, values with time formats can be plotted by converting the time values into ordinary numeric values. This is done with a combination of the strcol() and strptime() functions. The techniques underlying this example came from this tutorial.

Before we go any further, note that gnuplot has two separate time format settings:

set timefmt ...
this refers to the format of the values in the datafile; it tells gnuplot how to interpret the input time values.
set format ...
this refers to the format (which may or may not be a time) of the axis labels in the output plot; it tells gnuplot what text to display alongside the tics on the specified axis.

Strictly speaking, two different time formats can be scatter-plotted by using timefmt for one variable (column/exis) and converting the other column to numeric values. For the sake of the example, I've converted both columns, thereby avoiding timefmt altogether.

Now for the example: some time ago, I was diagnosing a fault on a computer, and needed to determine if there was a pattern in the times when a large number of files were last modified. It seemed to me that the best way to do that was to visualise the data using a scatter plot with date on the x axis, and time of day on the y axis. This created a time-table of sorts:

Timetable of file modification events

I created the data by redirecting a DIR /S command into a plain text file. I then trimmed off the header and footer values to leave only a list of files with their associated metadata. The x values were dates in YYYY-MM-DD format, and the y values were times in (24 hour) HH:MM format. For example:

2016-04-02  21:11             2,287 6CD_16040217241.cof
2016-04-07  15:10            20,460 20160407.log
2016-04-13  21:42             1,558 201604132142.lg
2016-04-24  21:04           200,726 lb_dr.obj
2016-04-27  18:41             6,282 20160427.log

NOTE: On the Windows operating system, the format of the DIR command's output can be controlled using the system-wide Short date format. To adjust this, go to Control Panel -> Clock, Language, and Region -> Region and Language.

Clearly, the x and y column formats were completely different, so timefmt was insufficient. The trick to reading two different time formats, was to read them as alphanumeric text (i.e. strings), and convert them to scalar numbers (e.g. elapsed seconds) relative to some arbitrary time point. gnuplot provides a number of string- and time-handling functions, one of which is strptime(); this function takes a date/time format string, along with a value string, and returns the equivalent number of seconds since 2000.

To obtain the date values for the x axis, I used strcol(1), which returned the first column as string values, and the y values came from strcol(2), which gave the second column as strings. To convert the strings to numeric equivalents I wrote my own functions, inspired by the aforementioned web page.

For x (date), I used elapsed seconds (since 1st Jan 2000) at midnight on the given date, which was calculated by converting the YYYY-MM-DD date to seconds-since-2000 with

strptime( "Y%-%m-%d", t_date )

This value was then divided by the number of seconds in a day (86400), to give days-since-2000. This was rounded down with floor() to obtain midnight, since days are precise enough for our x axis.

NOTE: You may be wondering why I used “.0” in the definition of days_s. The reason for doing so was to avoid an `integer division'. For example, in gnuplot, print 2 / 4.0 returns 0.5, but print 2 / 4 returns 0. Whilst this isn't a problem for the date values, which are all greater than day_s, it definitely is a problem for the time values, which are all smaller (the division will return zero).

The number of days was then multiplied by seconds-per-day because the values must be in seconds for the labels' time format to work properly. The label's time format is specified with

set format x "%Y-%m-%d"

NOTE: We don't really need to round date-only values, since they obviously don't contain part-days (i.e. time-of-day). I've done so here only for completeness.

The process is very similar for the y (time-of-day) values; I converted the HH:MM values to elapsed seconds since midnight. In other words, I treated the HH:MM (which would never exceed 23:59) as elapsed time and converted it to its equivalent in seconds (by omitting date from the format and value strings, we cancel out its effect). As an aside, I could just as easily have used "%M:%S" as the format string; the units might have changed, but the visual patterns would have been the same.

The nokey, xtics and bmargin settings are there only to make the plot a bit easier to read.

Here's the script:


reset
clear

# Seconds in a day. Use ".0" to make it floating-point.
day_s = 86400.0

# Change this to match the format of the data in your DIR listing.
date_format = "%Y-%m-%d"
date2sec(t_date) = strptime( date_format, t_date )
midnight(t_date) = floor( date2sec(t_date) / day_s ) * day_s

# Change this to match the format of the time in your DIR listing.
time_format = "%H:%M"
time2sec(t_time) = strptime( time_format, t_time )

set nokey

# Set up the x-axis label format
# In order to use date/time formats, we must indicate that the axis refers to time.
set xdata time
set format x date_format

# Leave enough space for the date labels
set bmargin 5
set xtics rotate by 60 offset -4,-4

# Set up the y-axis label format
set ydata time
set format y time_format

plot 'diroutput.txt' using (midnight(strcol(1))):(time2sec(strcol(2))) with points
set output


Home About Me
Copyright © Neil Carter

Last updated: 2016-05-17