Histograms are handy for showing how multiple values combine to form a whole, sometimes cumulatively. Such data often comes in timed form; for instance, the multiplicity of values might come from incoming and outgoing traffic on a network hub, effort expended within various categories; whilst the time basis might be weekly amounts of the forementioned values (incoming vs. outgoing traffic per week)
Unfortunately, the current version of gnuplot does not support explicit x
axis values for histograms. Instead, the x values are derived automatically
from the line number in the data file. For instance, if you
set style data histograms, you'll get the
“Need full using spec for x time data”. This web page shows one
way of solving this problem.
First, let's look at the original histogram way to see the problem. Consider the contents of the data file date_mins.tsv:
Date Level_0_Mins Level_1_Mins Level_2_Mins 2011-01-08 30.34 22.58 161.08 2011-01-15 23.83 20.33 104.00 2011-01-22 50.50 16.17 79.75 2011-01-29 67.59 21.74 99.25 2011-02-05 37.58 33.33 155.33 2011-02-12 48.17 44.33 66.00 2011-02-19 89.34 12.42 91.42 2011-02-26 113.09 35.83 123.34 2011-04-02 174.25 105.25 221.25 2011-04-09 98.09 55.92 109.00 2011-04-16 98.67 30.83 202.00 2011-04-23 87.17 58.25 127.09 2011-04-30 139.74 67.33 232.84 2011-04-30 20.0 10.0 30.0
Notice that there are no entries for March (2011-03-??). Now, we probably expect any plot of this data to leave a gap where March's entries would have been. Also, there are two entries for 2011-04-30; this is intentional to demonstrate certain behaviours.
If we want to plot the data file above with the
plotting style, we can't specify that the x axis is time-based. Thus,
gnuplot's histogram feature, because it uses line number (or, more
specifically, data row index) rather than the actual x value (time-based or
otherwise), doesn't detect any gap in the x values, so none is shown in the
histogram plot below.
Even though gnuplot does not use the date values in the first column to
determine the x value in the plot, we can still have the dates show up as
x-labels. This is achieved by adding
:xticlabels(1) after the y
column number in the
Note, also, that gnuplot has to be told that the first line of the data file contains column headings, which can be used to label the plot. Moreover, the heading values (for each column) should not contain spaces, since that would also confuse gnuplot.
reset clear # If we don't use columnhead, the first line of the data file # will confuse gnuplot, which will leave gaps in the plot. set key top left outside horizontal autotitle columnhead set xtics rotate by 90 offset 0,-5 out nomirror set ytics out nomirror set style fill solid border -1 # Make the histogram boxes half the width of their slots. set boxwidth 0.5 relative # Select histogram mode. set style data histograms # Select a row-stacked histogram. set style histogram rowstacked plot "date_mins.tsv" using 2:xticlabels(1) lc rgb 'green', \ "" using 3 lc rgb 'yellow', \ "" using 4 lc rgb 'red'
Clearly, the plot above has no gap for March; gnuplot treats histogram 'bins'
as being based on category data, not ordinal data (to be more
accurate, the bins do have an ordering: their position in the file, but we might
have expected an ordering based on date). There is, however, a workaround to the
date-ordering problem, which uses the
boxes (as opposed to
histograms) plotting style. This is demonstrated in the example
Another special technique used in the following script is the mention of data
values instead of data columns in the
command. Immediately after the
using keyword, we see
1:($2+$3+$4) The use of brackets and the dollar sign tells gnuplot
that we mean an expression for it to evaluate, rather than a column number. So,
in this command, we want to use column one for the x values, but for the y
values, we want to sum the value
of columns two, three, and four.
reset clear set key top left outside horizontal autotitle columnhead set xtics rotate by 90 offset 0,-5 out nomirror set ytics out nomirror # This won't affect histogram plots since they just treat the # dates in the first columns as literal strings. set format x "%Y-%m-%d" # Setting xdata to time precludes the use of histograms. set xdata time set timefmt "%Y-%m-%d" set style fill solid border -1 # 1 week = 604,800 seconds. # Make the box 50% of its slot. set boxwidth 302400 absolute # ($2+$3) is an expression meaning 'add the values in column 2 and # column 3'; this is effectively the same as row-stacking. # Data-series should be given in order of decreasing magnitude. plot "date_mins.tsv" using 1:($2+$3+$4) with boxes lc rgb "red", \ "" using 1:($2+$3) with boxes lc rgb "yellow", \ "" using 1:2 with boxes lc rgb "green"
This produces the following plot; note that we now have a gap where March is. Also, the order of the items in the key is reversed.
Be careful to ensure that the (summed) columns are mentioned in order of decreasing size in the plot command. This is to avoid the fact that smaller boxes are obscured by larger ones, if the larger one is mentioned later in the plot command.
Note that the last bar has a small horizontal line near the bottom; this is caused by the fact that, now that gnuplot is treating the first column as time values, we have two rows of data for the same time (i.e. the same x position). Thus, it tries to draw the second duplicate row on top of the first one, and isn't entirely successful. Clearly, whilst histogram plots handle duplicate x values by creating new bars for them, they won't work when x values are treated as numerical (be they dates, times, or simple numbers).
|Home||About Me||Copyright © Neil Carter|
Content last updated: 2012-02-28