Graphs: Comparing R, Excel, Tableau, SPSS, Matlab, JS, Python, and SAS


Are you a visualization and graphing expert? Can you identify which tool (R, Excel, Tableau, SPSS, Matlab, JS, Python, or SAS) was used to create each of these graphs? No cheating!

I recently read Tim Matteson's blog where he presented 18 graphs, and had his readers try to guess which software was used to create each of them. I thought it was an interesting exercise, but I was a little disappointed in the graphs. My buddy Paul Kent said I should create my own new/improved version of each graph, and I thought that sounded like a splendid idea! Be sure to click the link above to see the original versions, so you can better appreciate the improvements.

Can you determine which software I used to create each of my improved versions?

Chart 1

The biggest problem in the original graph, was that the colors and order of the bar segments didn't make sense - seems like they should be bad-to-good, but the original graph had them in alphabetical order. Also, the Xnn labels along the left-side axis were cluttered and difficult to read. In my version I spaced the labels out more, and also left-aligned them so the 'X's lined up and made them easier to read.

Chart 2

In the original chart, having a colored area behind the questions made it look (at first glance) like those were bars, therefore I didn't color that area in my graph. I was a bit confused by the numbers to the left and right of the bars in the original, therefore in my version I color-coded these numbers so the user would know at-a-glance that the left number represented 'disagree' and the right number represented 'agree'. In survey data like this, I think it's important to be able to see whether over 50% of the respondents agree or disagree, so I added a reference line at 50%

Chart 3

In the original chart, they had the axis labels along both the left and bottom, showing each label twice. In my plot, I placed the label along the diagonal boxes, allowing me to only show each label once (and also eliminating the sideways labels along the left axis). I used transparent plot markers, so you can see where markers are stacking. I also use a different color marker from the axes and text, so the markers stand out more.

Chart 4

The original chart used so many grid lines that I found it difficult to follow a line to the axis. I used years rather than months along the x-axis, because that seemed easier to understand for such a long time period (quick - how many years is 70 months!?! see what I mean!)

Chart 5

For this one, I left it pretty much as-is, except I placed the labels inside the longer bars (rather than outside), thereby making more room for the bars. I also explain what 'cola' is in the title, since it's an acronym most people probably aren't familiar with - wouldn't want people thinking this was a graph about soft drinks!

Chart 6

For this chart, I didn't have the original data, so I decided to go with some data that was similar, but less dense. I'm not sure what the original chart was trying to show, but I can't imagine it was doing a very good job of it (looked like a cluttered mess of points & lines to me).

Chart 7

In the original chart, I don't think the circles showed up very well against the black background - therefore I didn't put any circles on my version (if you want to see a black map with circles, have a look at my map with animated circles). Be sure to click here, to see the full size map (to get the full effect)!

Chart 8

The original chart was a simple scatter, with '+' markers, and dark grid lines. In my version, I used transparent round markers - this way you can see when multiple markers are stacked in the same location. I also use light grid lines, so the grid doesn't compete with the markers for your attention. I also added some summary statistics in the top/left corner of the graph.

Chart 9

I'm not a big fan of using black backgrounds in a graph ... but if you're going to create any kind of graph, at least show the scales along the sides!

Chart 10

This is another one I didn't have the exact data for, so I used some similar data. The biggest change I made was using transparent markers so you can see where multiple markers are stacked on top of each other. I also use a grid of reference lines from both axes, rather than just one axis.

Chart 11

Although the original chart didn't have any labeling, I suspect it was some of Fisher's classic iris data set, therefore I used some of that data in my chart. The first improvement I made was labeling the graph, so you quickly know what I'm plotting. I also annotate a picture of a labeled iris flower, so you know what a petal and a sepal is.

Chart 12

I'm not a big fan of using 3d bars on a 3d map to show data, like they did in the original graph - the taller/front bars inevitably obscure some of the shorter/back bars, etc. Therefore in my graph I show how to plot data as markers on a 2d street map.

Chart 13

In the original chart, I'm not sure exactly which year(s) of earthquake data they use, since there is no title or label. In my chart, I show all the major earthquakes for a 40+ year time period, and I also center my map on the Pacific ocean (so it better shows the 'ring of fire'). I also use circles rather than filled dots, so it's easier to see almost-overlapping markers.

Chart 14

In charts like this, I really don't like when people use a diverging color scheme (gradient shades of 2 colors, meeting in the middle) - those should be used when the scale goes from bad-to-good, etc. In this case, where the colors represent a simple "Percent of Trials" gradient shades of a single color should be used. They left-justified their Cancer Conditions, which placed them far from the chart, and made it difficult to see which colored blocks went with which label - I right-justified them. Also, it was difficult to determine whether white boxes were light gradients, or no-data. In my chart, I use a hatched pattern for no-data, to make the distinction more obvious.

And in the bottom (bar) chart portion, I was a bit confused by the numbers on top of the bars - after a bit of scrutinizing the graph, I found that the numbers represent the difference in the Actual and Expected time. Therefore I tried to make that more obvious in my bar chart.

Chart 15

I don't really have access to any software to do solid-modeling, so instead of doing an animation of a solid-model of the earth (which looked pretty pitiful in the original blog), I am using a different animation. Click here to see it animated.

Chart 16

For this chart, my version is a little cleaner, and I've moved a few of the labels to new locations.

Chart 17

The original chart had somewhat willy-nilly axis tick marks, and I wasn't real keen on using circles in the legend to coincide with the lines in the graph. I didn't have this exact data, therefore I chose some similar time-series data that I could show three lines overlaid. Notice that in addition to the color legend, I also added a label to the end of each line.

Chart 18

For this one, I used slightly different colors, and slightly larger/bolder text, but aside from that it was already a great graph. :-)

OK - time to enter your guesses in the comments section! Which software(s) were used to create which graphs?

Yep, I used SAS to create all 18 of these charts! And if you'd like to see the SAS code, I've set up an examples page.

This content was reposted from the SAS Learning Post. Go there to view the original.

Robert Allison, The Graph Guy!, SAS

Robert Allison has worked at SAS for more than 20 years and is perhaps the foremost expert in creating custom graphs using SAS/GRAPH. His educational background is in computer science, and he holds a BS, MS, and PhD from North Carolina State University. He is the author of several conference papers, has won a few graphic competitions, and has written a book calledSAS/GRAPH: Beyond the Basics.

People Using Smartphones While Driving: The Numbers Are In

Are you texting while driving? Here's a look at distracted drivers by state.

Airlines Involuntarily Bumping Passengers: A Look at the Data

Airlines bumping paying passengers is an issue that has gotten a lot of attention recently. Here's a visual representation of how often it happens.


Re: Surprise message
  • 1/7/2017 4:58:57 PM
NO RATINGS

Very interesting as an example of how flexible SAS can be. The art of the graph certainly showed up well in that one example. I wonder if those with an artistic bent better than my brain's can appreciate that example and more readily read the data represented. I personally like "simpler the better" for my taste though, just easier for me to quickly figure out the relationships.

Re: Surprise message
  • 1/3/2017 6:14:48 PM
NO RATINGS

@Robert, congratulations that you were able to convert them into SAS format.  4 were done with R, 3 with SPSS, 5 with Excel, 2 with Tableau, 1 with Matlab, 1 with Python, 1 with SAS, and 1 with JavaScript.  My favorite is the very last chart. I get updates on Comsol that is used with Mathlab. Physicists and engineers use them.  I often work with Excel and have easily converted data into charts, bars, lines, dots, etc with a mouse's click.

Re: Surprise message
  • 1/3/2017 2:58:20 PM
NO RATINGS

@Robert wow, so you're able to get a link in the comments! I haven't been able to do that for a long time I'll try this one, then https://www.prisonpolicy.org/blog/2016/12/30/our2016dataviz/?referrer=justicewire 

I thought you might be interested in it because it shows a number of different style graphics to illustrate the American incarceration story. As I said, I prefer graphics that deal with just a small number of variables, so I do find the big pie chart rather overwhelming. 

Re: Surprise message
  • 1/2/2017 8:56:30 PM
NO RATINGS

It appears some of the links in the AllAnalytics copy of my blog post aren't working. I've sent JC an email, but an auto-reply tells me he's out of the office, so I'm not sure when the links will be fixed. In the meantime, I invite you to see the original copy of my blog (since that is one of the links that is broken, I'll give the full URL below):

http://blogs.sas.com/content/sastraining/2016/12/20/graphs-comparing-r-excel-tableau-spss-matlab-js-python-and-sas/

 

Re: Short answer: Nope
  • 1/2/2017 7:18:13 PM
NO RATINGS

@Terry I can say the same, way above my abilities.

Re: Surprise message
  • 1/2/2017 7:16:35 PM
NO RATINGS

@rbaz agreed. I like the way the wavy one #11 looks but more for art than for reading meaning into it. To gain insight, I prefer, clearly defined graphs that focus on just a few variables like #`17

Re: What were done in what
  • 12/31/2016 2:45:49 PM
NO RATINGS

Very impressive, @jmyerson! Since you're obviously speaking from experience, any favorites or preferred languages for graphics creation?

Re: Short answer: Nope
  • 12/31/2016 2:43:39 PM
NO RATINGS

Yup, same here, @Michelle.

This is so far above my paygrade I'm getting a nosebleed!

Re: What were done in what
  • 12/31/2016 8:16:56 AM
NO RATINGS

Hint: JS can be used to create animations

Re: What were done in what
  • 12/31/2016 4:16:51 AM
NO RATINGS

Robert asked which software were used to create which graphs.  I can easily recognize one graph created with MathLib.

Page 1 / 3   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +