gnuplot Histograms

gnuplot's histogram plotting style (which is rather similar to the boxes style) has a variety of subtypes, which are demonstrated here. We'll use the following data file:

Date	Level_0_Mins	Level_1_Mins	Level_2_Mins
2011-01-08	30.34	22.58	161.08
2011-01-15	23.83	20.33	104.00
2011-01-22	50.50	16.17	79.75
2011-01-29	67.59	21.74	99.25
2011-02-05	37.58	33.33	155.33
2011-02-12	48.17	44.33	66.00
2011-02-19	89.34	12.42	91.42
2011-02-26	113.09	35.83	123.34
2011-04-02	174.25	105.25	221.25
2011-04-09	98.09	55.92	109.00
2011-04-16	98.67	30.83	202.00
2011-04-23	87.17	58.25	127.09
2011-04-30	139.74	67.33	232.84
2011-04-30	20.0	10.0	30.0

Note that there are two entries (the last two lines) for 2011-04-30; this is intentional (to demonstrate a point about time-based x values). As described in my page on time-based histograms, gnuplot does not support date/time values (as opposed to labels) for the x axis. Hence, there is no gap where March should be, and 2011-04-30 appears twice in the following plots. Try using set xdata time and you'll see what I mean; you'll get the error message "need full using spec for x time data". Similarly, set timefmt and set xdata time will have no effect.

The following plot commands are common to all four of the histograms shown in this page.

clear
reset
unset key
# Make the x axis labels easier to read.
set xtics rotate out
# Select histogram data
set style data histogram
# Give the bars a plain fill pattern, and draw a solid line around them.
set style fill solid border

Clustered

With the clustered style, one column in the data file corresponds to one bar in the plot, whilst the lines of the data file correspond to the clusters of the bars. Thus, using the following plot commands, we get three bars in each cluster, and one cluster for each row (notice the two clusters for 2011-04-30, which is what we expect).

set style histogram clustered
plot for [COL=2:4] 'date_mins.tsv' using COL:xticlabels(1) title columnheader
Figure 1: Clustered Histogram

Figure 1: Clustered Histogram

Note the use of the for feature. This allows us to select multiple columns (2 to 4, in this case); without the for, we'd only be able to plot one column of data (using something like plot 'date_mins.tsv' using 3:xticlabels(1)), which defeats the purpose of the clustered histogram:


Errorbars

The next histogram type is errorbars. For this, gnuplot can accept up to three columns for y values: the main value for the height of the bar, and a minimum and maximum value for the errorbar. Thus:

# We need to set lw in order for error bars to actually appear.
set style histogram errorbars linewidth 1
# Make the bars semi-transparent so that the errorbars are easier to see.
set style fill solid 0.3
set bars front
plot 'date_mins.tsv' using 2:3:4:xticlabels(1) title columnheader

This code uses column two for the bar size, column three for the errorbar's minimum, and column four for the maximum. Column one is again used for the x axis labels. If you're not careful, the errorbars might not appear (use the linewidth option), or the lower part might be obscured by the bar (make the fill pattern for the bar semi-transparent).

Figure 2: Errorbars Histogram

Figure 2: Errorbars Histogram

Column stacked

The next histogram type is columnstacked. With this style, each bar of the histogram corresponds to one column of the data. Thus, with our data, we get three bars. Each bar is made up of a stack of 'slices', each corresponding to one row of the data file (the labels for the x axis come from the column headings, so we don't use xticlabels for this plot). Given that we have more rows than columns, the resulting plot looks rather complex, and is not entirely suitable for visualising the data we have.

Note that we use slightly narrower bars for this and the row-stacked histograms; this makes them somewhat more pleasing to my eye.

set style histogram columnstacked
set boxwidth 0.6 relative
plot for [COL=2:4] 'date_mins.tsv' using COL title columnheader

Note that the date values from the first column aren't used in this plot.

Figure 3: Column-stacked Histogram

Figure 3: Column-stacked Histogram


Row-stacked

The final histogram type is rowstacked. With this style, each bar corresponds to one row of the data file, with the bars consisting of slices corresponding to the data columns. Thus, we get 14 bars each with three slices.

set style histogram rowstacked
set boxwidth 0.6 relative
plot for [COL=2:4] 'date_mins.tsv' using COL:xticlabels(1) title columnheader
Figure 4: Row-stacked Histogram

Figure 4: Row-stacked Histogram



Home About Me
Copyright © Neil Carter

Content last updated: 2015-09-02