March 28, 2009

Best Means, Bar None

Although bar charts are the default graph for many statistical packages, they create a number of problems. I think there are better alternatives when you’re graphing mean values with margins of error, and will try to illustrate this with a makeover of a standard bar chart. See if you agree.


Here is the raw output of the stats package, graphing nine mean values with standard deviations. The biggest problems, for me, are the obtrusive frames and tick marks, and the nasty vibration effect caused by all those high-contrast bars crammed together.


To improve this, I deleted the axes, greyed out the tick marks, shrunk the graph title, and rotated the y-axis label; all just what you’d expect from me if you’ve read other postings. I needed to distinguish between results from the main body of the river and its tributaries, so tried to make the distinction intuitive by using different shades of a less-raucous color.

All very well. But should we even be using a bar chart at all?

The problem with bars is that they draw attention to a single value—the height of the bar—while minimising information about confidence intervals. Some people don’t even use error bars on their charts, which is a bit unforgivable. Even with error bars, it makes a big difference if you show those confidence intervals in both directions from the mean, or only one way. Compare these three identical pairs of means: bars can make it harder to see whether two values might actually be the same.


Let’s try graphing those values again, starting with the error bars alone, with the mean value knocked out in white. (I created a small horizontal white line and manually aligned a copy with the top of each bar. Illustrator CS4 makes this particularly easy, with little guides popping up to give you visual feedback on whether things are lined up exactly. Then I deleted the bars themselves and thickened the remaining vertical lines.)


Now we’re seeing a much better picture of the uncertainty of our results. We might decide we need better data for site 8 if we want to be sure whether it’s in the low or high group; it may well have the same mean as site 9.

The other advantage of using lines rather than bars is that they take up much less room. I can fit almost twice as much information into the same space, which allows me to compare two different sample localities side by side.


So next time you’re faced with a whole page of tiny bar charts, you may want to consider pulling out the error bars and displaying them on their own. Probably more honest, and certainly more compact.

Thanks again to EOS Ecology for permission to use work I did for them in Pictures of Numbers—the data are real, the locations have been changed.



Pictures of Numbers is a book-project-in-progress, consisting of practical tips and techniques for busy researchers on improving their data presentation, and is updated in intermittent bursts of regularity by Mike Dickison (his personal page). Mike did his Zoology PhD at Duke, and has run data presentation and design workshops since 1995 to scientists in New Zealand, Australia, and the USA; he’s a Learning Advisor at the University of Canterbury in Christchurch, NZ, and does some information design consulting. If you have an information graphic you'd like him to troubleshoot, gratis, and don't mind the results being posted to this site, drop him a line. While Pictures of Numbers is being transferred to a new server, comments and other dynamic features of the site may not work; please be patient.
Enter your e-mail address and be automatically notified of new posts (no, you won’t be spammed).
Creative Commons License

Pictures of Numbers
has a Creative Commons License. Images created by others retain the © of their respective owners, and their reproduction here for educational and critical purposes constitutes fair use.