Suppose you want to create a scatter (XY) plot with time values on each
axis, but the columns of time values have different formats; for instance,
you might have date values in one column (axis), and time-of-day in the other
column. Alas, a frustrating limitation of gnuplot is that it does not
(innately) support different time formats (timefmt
) for
different variables (data columns).
Fortunately, although gnuplot supports only one timefmt
setting, values with time formats can be plotted by converting the time
values into ordinary numeric values. This is done with a combination of the
strcol()
and strptime()
functions. The techniques
underlying this example came from this
tutorial.
Before we go any further, note that gnuplot has two separate time format settings:
set timefmt ...
set format ...
Strictly speaking, two different time formats can be scatter-plotted by
using timefmt
for one variable (column/exis) and converting the
other column to numeric values. For the sake of the example, I've converted
both columns, thereby avoiding timefmt
altogether.
Now for the example: some time ago, I was diagnosing a fault on a computer, and needed to determine if there was a pattern in the times when a large number of files were last modified. It seemed to me that the best way to do that was to visualise the data using a scatter plot with date on the x axis, and time of day on the y axis. This created a time-table of sorts:
I created the data by redirecting a DIR /S
command into a
plain text file. I then trimmed off the header and footer values to leave
only a list of files with their associated metadata. The x
values were dates in YYYY-MM-DD format, and the y values were times in (24
hour) HH:MM format. For example:
2016-04-02 21:11 2,287 6CD_16040217241.cof 2016-04-07 15:10 20,460 20160407.log 2016-04-13 21:42 1,558 201604132142.lg 2016-04-24 21:04 200,726 lb_dr.obj 2016-04-27 18:41 6,282 20160427.log
NOTE: On the Windows operating system, the format of the DIR command's output can be controlled using the system-wide Short date format. To adjust this, go to Control Panel -> Clock, Language, and Region -> Region and Language.
Clearly, the x and y column formats were completely different, so
timefmt
was insufficient. The trick to reading two different time
formats, was to read them as alphanumeric text (i.e. strings), and convert them
to scalar numbers (e.g. elapsed seconds) relative to some arbitrary time point.
gnuplot provides a number of string- and time-handling functions, one of which
is strptime()
; this function takes a date/time format
string, along with a value string, and returns the equivalent number of
seconds since 2000.
To obtain the date values for the x axis, I used strcol(1)
,
which returned the first column as string values, and the y values came from
strcol(2)
, which gave the second column as strings. To convert the
strings to numeric equivalents I wrote my own functions, inspired by the
aforementioned web page.
For x (date), I used elapsed seconds (since 1st Jan 2000) at midnight on the given date, which was calculated by converting the YYYY-MM-DD date to seconds-since-2000 with
strptime( "Y%-%m-%d", t_date )
This value was then divided by the number of seconds in a day (86400), to
give days-since-2000. This was rounded down with floor()
to obtain
midnight, since days are precise enough for our x axis.
NOTE: You may be wondering why I used “.0” in the definition of
days_s
. The reason for doing so was to avoid an `integer division'.
For example, in gnuplot, print 2 / 4.0
returns 0.5, but print
2 / 4
returns 0. Whilst this isn't a problem for the date values, which
are all greater than day_s
, it definitely is a problem for
the time values, which are all smaller (the division will return zero).
The number of days was then multiplied by seconds-per-day because the values must be in seconds for the labels' time format to work properly. The label's time format is specified with
set format x "%Y-%m-%d"
NOTE: We don't really need to round date-only values, since they obviously don't contain part-days (i.e. time-of-day). I've done so here only for completeness.
The process is very similar for the y (time-of-day) values; I converted the
HH:MM values to elapsed seconds since midnight. In other words, I treated the
HH:MM (which would never exceed 23:59) as elapsed time and converted it to its
equivalent in seconds (by omitting date from the format and value strings, we
cancel out its effect). As an aside, I could just as easily have used
"%M:%S"
as the format string; the units might have changed, but the
visual patterns would have been the same.
The nokey
, xtics
and bmargin
settings
are there only to make the plot a bit easier to read.
Here's the script:
reset
clear
# Seconds in a day. Use ".0" to make it floating-point.
day_s = 86400.0
# Change this to match the format of the data in your DIR listing.
date_format = "%Y-%m-%d"
date2sec(t_date) = strptime( date_format, t_date )
midnight(t_date) = floor( date2sec(t_date) / day_s ) * day_s
# Change this to match the format of the time in your DIR listing.
time_format = "%H:%M"
time2sec(t_time) = strptime( time_format, t_time )
set nokey
# Set up the x-axis label format
# In order to use date/time formats, we must indicate that the axis refers to time.
set xdata time
set format x date_format
# Leave enough space for the date labels
set bmargin 5
set xtics rotate by 60 offset -4,-4
# Set up the y-axis label format
set ydata time
set format y time_format
plot 'diroutput.txt' using (midnight(strcol(1))):(time2sec(strcol(2))) with points
set output
Home | About Me | Copyright © Neil Carter |
Last updated: 2016-05-17