Are you a visualization and graphing expert? Can you identify which tool (R, Excel, Tableau, SPSS, Matlab, JS, Python, or SAS) was used to create each of these graphs? No cheating!
I recently read Tim Matteson's blog where he presented 18 graphs, and had his readers try to guess which software was used to create each of them. I thought it was an interesting exercise, but I was a little disappointed in the graphs. My buddy Paul Kent said I should create my own new/improved version of each graph, and I thought that sounded like a splendid idea! Be sure to click the link above to see the original versions, so you can better appreciate the improvements.
Can you determine which software I used to create each of my improved versions?
The biggest problem in the original graph, was that the colors and order of the bar segments didn't make sense - seems like they should be bad-to-good, but the original graph had them in alphabetical order. Also, the Xnn labels along the left-side axis were cluttered and difficult to read. In my version I spaced the labels out more, and also left-aligned them so the 'X's lined up and made them easier to read.
In the original chart, having a colored area behind the questions made it look (at first glance) like those were bars, therefore I didn't color that area in my graph. I was a bit confused by the numbers to the left and right of the bars in the original, therefore in my version I color-coded these numbers so the user would know at-a-glance that the left number represented 'disagree' and the right number represented 'agree'. In survey data like this, I think it's important to be able to see whether over 50% of the respondents agree or disagree, so I added a reference line at 50%
In the original chart, they had the axis labels along both the left and bottom, showing each label twice. In my plot, I placed the label along the diagonal boxes, allowing me to only show each label once (and also eliminating the sideways labels along the left axis). I used transparent plot markers, so you can see where markers are stacking. I also use a different color marker from the axes and text, so the markers stand out more.
The original chart used so many grid lines that I found it difficult to follow a line to the axis. I used years rather than months along the x-axis, because that seemed easier to understand for such a long time period (quick - how many years is 70 months!?! see what I mean!)
For this one, I left it pretty much as-is, except I placed the labels inside the longer bars (rather than outside), thereby making more room for the bars. I also explain what 'cola' is in the title, since it's an acronym most people probably aren't familiar with - wouldn't want people thinking this was a graph about soft drinks!
For this chart, I didn't have the original data, so I decided to go with some data that was similar, but less dense. I'm not sure what the original chart was trying to show, but I can't imagine it was doing a very good job of it (looked like a cluttered mess of points & lines to me).
In the original chart, I don't think the circles showed up very well against the black background - therefore I didn't put any circles on my version (if you want to see a black map with circles, have a look at my map with animated circles). Be sure to click here, to see the full size map (to get the full effect)!
The original chart was a simple scatter, with '+' markers, and dark grid lines. In my version, I used transparent round markers - this way you can see when multiple markers are stacked in the same location. I also use light grid lines, so the grid doesn't compete with the markers for your attention. I also added some summary statistics in the top/left corner of the graph.
I'm not a big fan of using black backgrounds in a graph ... but if you're going to create any kind of graph, at least show the scales along the sides!
This is another one I didn't have the exact data for, so I used some similar data. The biggest change I made was using transparent markers so you can see where multiple markers are stacked on top of each other. I also use a grid of reference lines from both axes, rather than just one axis.
Although the original chart didn't have any labeling, I suspect it was some of Fisher's classic iris data set, therefore I used some of that data in my chart. The first improvement I made was labeling the graph, so you quickly know what I'm plotting. I also annotate a picture of a labeled iris flower, so you know what a petal and a sepal is.
I'm not a big fan of using 3d bars on a 3d map to show data, like they did in the original graph - the taller/front bars inevitably obscure some of the shorter/back bars, etc. Therefore in my graph I show how to plot data as markers on a 2d street map.
In the original chart, I'm not sure exactly which year(s) of earthquake data they use, since there is no title or label. In my chart, I show all the major earthquakes for a 40+ year time period, and I also center my map on the Pacific ocean (so it better shows the 'ring of fire'). I also use circles rather than filled dots, so it's easier to see almost-overlapping markers.
In charts like this, I really don't like when people use a diverging color scheme (gradient shades of 2 colors, meeting in the middle) - those should be used when the scale goes from bad-to-good, etc. In this case, where the colors represent a simple "Percent of Trials" gradient shades of a single color should be used. They left-justified their Cancer Conditions, which placed them far from the chart, and made it difficult to see which colored blocks went with which label - I right-justified them. Also, it was difficult to determine whether white boxes were light gradients, or no-data. In my chart, I use a hatched pattern for no-data, to make the distinction more obvious.
And in the bottom (bar) chart portion, I was a bit confused by the numbers on top of the bars - after a bit of scrutinizing the graph, I found that the numbers represent the difference in the Actual and Expected time. Therefore I tried to make that more obvious in my bar chart.
I don't really have access to any software to do solid-modeling, so instead of doing an animation of a solid-model of the earth (which looked pretty pitiful in the original blog), I am using a different animation. Click here to see it animated.
For this chart, my version is a little cleaner, and I've moved a few of the labels to new locations.
The original chart had somewhat willy-nilly axis tick marks, and I wasn't real keen on using circles in the legend to coincide with the lines in the graph. I didn't have this exact data, therefore I chose some similar time-series data that I could show three lines overlaid. Notice that in addition to the color legend, I also added a label to the end of each line.
For this one, I used slightly different colors, and slightly larger/bolder text, but aside from that it was already a great graph. :-)
OK - time to enter your guesses in the comments section! Which software(s) were used to create which graphs?
Yep, I used SAS to create all 18 of these charts! And if you'd like to see the SAS code, I've set up an examples page.
This content was reposted from the SAS Learning Post. Go there to view the original.