Do You Trust Statistics?

One of my favorite quotes is: "You can't believe everything you read on the Internet" -- Abe Lincoln, 1868.

And that is especially true when it comes to graphs and statistics. Hardly a day goes by without me seeing a bad graph that misrepresents the data (either intentionally or unintentionally) . Here is a recent bad example I was surprised to find on Statpedia ...

At first glance the graph seemed like a reasonable way to plot the data, but upon closer examination I found a terrible problem that compromises the data integrity! ... They have plotted the survey results all evenly-spaced (probably as character values), even though the surveys were not performed at evenly-spaced date intervals! This seriously misrepresents the data, especially towards the left side of the graph, when the surveys were performed much less frequently (the slope of the line is much steeper than it should be). Also, after examining the source data, I found that they had left out the value for the first/oldest survey.

I followed their link to the original study on the Pew research page, and found that they also created a graph:

Pew's graph was much better than the Statpedia one -- their dates were proportionally spaced, and they included the 1995 survey value. But Pew's graph still wasn't perfect. For example, I would have liked to see a better title that completely described what the data represents. Also, the colors for the two lines were very similar, making them difficult to match up with the legend. And I think it's a little redundant to show both the "uses" and "doesn't use" lines in the graph, since they're always going to be a mirror-image of each other.

As you might have guessed, I decided to create my own graph, and make a few improvements (click my graph below to see the full size version, with html mouse-over text)...

  • My title clearly states that the data is about US adults.
  • I only show one line, and let the area above and below the line represent the two values (with the emphasis being on the "uses" rather than the "doesn't use").
  • I added reference lines along the date axis, to make it easier to visually estimate when the surveys were performed.
  • And I include markers along the plot lines, so you can visually see that the surveys did not occur at evenly-spaced time intervals.


So, apparently the Abe Lincoln quote was right on target (that guy was way ahead of his time!) If you have a favorite quote about statistics or analytics, feel free to share it in a comment.

The content originally appeared on SAS Learning Post. Go there to read the original.

Robert Allison, The Graph Guy!, SAS

Robert Allison has worked at SAS for more than 20 years and is perhaps the foremost expert in creating custom graphs using SAS/GRAPH. His educational background is in computer science, and he holds a BS, MS, and PhD from North Carolina State University. He is the author of several conference papers, has won a few graphic competitions, and has written a book calledSAS/GRAPH: Beyond the Basics.

What Cities are in Hurricane Irma's Path?

Here's an example of using data and visualization to look at weather -- specifically, the possible path of Hurricane Irma. Does your city need to get ready?

Mapping Out the Next Robot Invasion

Where are all the robots today? Here's a look at a better data visualization to represent where in the US all the robots are.

Re: Bad Graph
  • 8/7/2016 8:39:51 AM

Some graphs may be a result of "I deny your reality and substitute my own." Whether intentional or an honest mistake through ignorance or confusion, it's always good to look a little deeeper to find the "best" reality possible.

Re: Bad Graph
  • 7/31/2016 2:38:48 PM

I tought he called it that because he didn't know the real name.

Unknown Quotes are the Best
  • 7/31/2016 2:27:37 PM

"...There are liars and damn liars." - unknown

Not sure if this is a statistics or analytics quote, but it ought to be.

Re: Bad Graph
  • 7/31/2016 2:23:27 PM

 "You can't believe everything you read on the Internet" -- Abe Lincoln, 1868.


I believe the term he actually used was "Interweb", the term "net" would not come for many years later.

Re: Bad Graph
  • 7/31/2016 8:41:43 AM

I think it is also about having more infomation available to confirm or deny an opinion. I remember a post in a R programming community in which a data scientist proved how MLMs are not viable business models except for the persons who were already closely associated with the entity producing the product.  

I thought it was funny (I abhor those "business" schemes!) and was a good example of confirm a hypothesis that one may feel strongly about.

Re: Bad Graph
  • 7/31/2016 8:29:27 AM

Or even our feeling towards history can alter with the information given.

Re: Bad Graph
  • 7/31/2016 8:28:21 AM

Most times I think lack of profiency with explaining details is at the heart, though dishonest representation of facts certainly occurs.

Re: Bad Graph
  • 7/31/2016 8:27:26 AM

LOL, a museum of bad graphs and decisions!  Maybe a good decision if it was sustainable! (Love it!)

Re: Bad Graph
  • 7/30/2016 10:26:33 PM

Proof that history can be rewritten!

Re: Bad Graph
  • 7/30/2016 5:47:43 PM

And I thought Al Gore invented the internet!

Page 1 / 3   >   >>