And that is especially true when it comes to graphs and statistics. Hardly a day goes by without me seeing a bad graph that misrepresents the data (either intentionally or unintentionally) . Here is a recent bad example I was surprised to find on Statpedia ...
At first glance the graph seemed like a reasonable way to plot the data, but upon closer examination I found a terrible problem that compromises the data integrity! ... They have plotted the survey results all evenly-spaced (probably as character values), even though the surveys were not performed at evenly-spaced date intervals! This seriously misrepresents the data, especially towards the left side of the graph, when the surveys were performed much less frequently (the slope of the line is much steeper than it should be). Also, after examining the source data, I found that they had left out the value for the first/oldest survey.
I followed their link to the original study on the Pew research page, and found that they also created a graph:
Pew's graph was much better than the Statpedia one -- their dates were proportionally spaced, and they included the 1995 survey value. But Pew's graph still wasn't perfect. For example, I would have liked to see a better title that completely described what the data represents. Also, the colors for the two lines were very similar, making them difficult to match up with the legend. And I think it's a little redundant to show both the "uses" and "doesn't use" lines in the graph, since they're always going to be a mirror-image of each other.
As you might have guessed, I decided to create my own graph, and make a few improvements (click my graph below to see the full size version, with html mouse-over text)...
- My title clearly states that the data is about US adults.
- I only show one line, and let the area above and below the line represent the two values (with the emphasis being on the "uses" rather than the "doesn't use").
- I added reference lines along the date axis, to make it easier to visually estimate when the surveys were performed.
- And I include markers along the plot lines, so you can visually see that the surveys did not occur at evenly-spaced time intervals.
So, apparently the Abe Lincoln quote was right on target (that guy was way ahead of his time!) If you have a favorite quote about statistics or analytics, feel free to share it in a comment.
The content originally appeared on SAS Learning Post. Go there to read the original.