gnuplot Summing Columns

Sometimes, one needs to plot one's data grouped in some way. For instance, where daily data would be too cluttered; the same data, summed into weekly values would be more readable.

Consider the following data (here's the full data set):


Date	WeekCommencing	Measure
2014-01-07	2014-01-06	8
2014-01-10	2014-01-06	11
2014-01-10	2014-01-06	18
2014-01-16	2014-01-13	18
2014-01-16	2014-01-13	12
2014-01-17	2014-01-13	7
2014-01-17	2014-01-13	9
2014-01-23	2014-01-20	8
2014-01-24	2014-01-20	12
2014-01-29	2014-01-27	8
2014-02-03	2014-02-03	8
2014-02-07	2014-02-03	10
...

You might generate a daily-value plot of this using the following gnuplot commands:


reset

# Tab separated values
set datafile separator "\t"
# Ignore lines containing column headings
set datafile commentschars "DW"
# Don't include a key since there is only one range being plotted
set nokey

set title "Individual Measures"
set ylabel "Count"

# X-axis values are dates.
set xdata time
set timefmt "%Y-%m-%d"

# This controls how the x-axis tick labels are formatted.
set format x "%Y-%m-%d"

# The data labels are quite long, so write them vertically
# to prevent them from overlapping.
set xtics rotate by 90 offset 0,-4

# We need a larger space at the bottom of the graph to fit in the dates.
set bmargin 5

plot 'summing_columns.txt' using 1:3

This produces the following plot:

Figure 1: Frequency Plot for Daily Values

Figure 1: Frequency Plot for Daily Values

This is rather cluttered. Note that if we had drawn the values with impulses, the plot might be easier to read, but lower values (at the same x-position) would be hidden behind/under higher values. The clutter (and overlapping x-position values) can be avoided by summing the data for each week, using the "WeekCommencing" field, thus:


set output "summing_columns_weekly.png"
set title "Weekly Summed Measures"
set ylabel "Weekly Sum of Value"
plot '<awk "{W[$2]+=$3} END {for (w in W) print w\"\t\"W[w]}" summing_columns.txt | sort' using 1:2 with impulses

This generates the following plot:

Figure 2: Frequency Plot for Summed Weekly Values

Figure 2: Frequency Plot for Summed Weekly Values

This is much easier to read, without losing any information.

The trick here is to use the Awk program. Awk is a handy program designed for manipulating tabular data such as we have here (and such as we usually have when plotting). Awk for Windows is available from the GnuWin32 project at SourceForge. Don't be put off by the "32"; it works fine on 64 bit Windows.

The plotting command is not as complicated as it seems:


Two Asides:

Adding a Trend Line

If you want to plot a sort of average trend line for the weekly data, try the following command; this is where the |sort becomes necessary (try it without the sort):


plot '<awk "{W[$2]+=$3} END {for (w in W) print w\"\t\"W[w]}" summing_columns.txt | sort' using 1:2 with impulses,\
	'' using 1:2 smooth bezier

Note the smooth bezier plotting style, which produces the following:

Figure 3: Summed Weekly Values with Bezier Trend Line

Figure 3: Summed Weekly Values with Bezier Trend Line

Frequency Histogram

The smooth command can be used to produce a frequency histogram (without actually using gnuplot's histogram feature). We do this by using it with the frequency option. In doing so, we need to change the x-axis to use the third column values (the ones that we used for the y-axis in the previous plots). For the y-axis, we use the numeric value of one. By doing this, every time gnuplot encounters a given x-value, it adds one to the tally for that value. In other words, it counts the number of times each x-axis value appears. We specify a numeric value (as opposed to a column number) in the using part, by placing brackets around it, hence 2:(1) means x=column 2, and y=number 1.

The other set commands are there to change the settings left over from the previous (time-based) plots. You could just use reset


set title "Distribution of Weekly Summed Values"
set xlabel "Summed Value"
set ylabel "Weekly Frequency"

# Turn off date/time data mode
set xdata
# Change x-axis labels to whole numbers
set format x "%.0f"

# Turn off the big margin and vertical writing for the x-axis
set bmargin 3
set xtics rotate by 0 offset 0,0

# impulses will be a bit thin, so use boxes instead
set boxwidth 0.5
set style fill solid 0.5

plot 'summing_columns.txt' using 3:(1) smooth frequency with boxes
Figure 4: Distribution of Summed Weekly Values

Figure 4: Distribution of Summed Weekly Values



Home About Me
Copyright © Neil Carter

Last updated: 2014-11-24