This week, we're headed in the opposite direction, looking instead for the commonalities, the consistencies, and the patterns. I introduced the concept of How Much, How Soon, How Certain some time ago, and since then I've focused primarily on the most neglected component of that trio, "How Certain," or put more straightforwardly -- risk.
"How Soon," which comprises NPV, IRR, and the time value of money, is better covered in other forums. But sometimes you just want to focus on the deceptively simple element of "How Much" -- What is it, and how big is it?
Consider once again the graphic to the left. Last time, we were concerned with the variability of the several ovals -- what shape, what angle, how wide, how dense. If I used the center of each as my "best guess" or the major axis as a trend line, how far off might my forecast or business decision be? How bad could it get?
Today, however, I want to consider the fact that there are ovals at all. Where did they come from? And the colors? Even the axes -- they aren't arbitrary. Imagine this graphic without the ovals and without the coloration -- just a plot of black dots on a white grid.
Noisy. Tough to see the forest for the trees. It's just a lot of stuff, isn't it? Kind of a tilted "T" shape to the stuff, but does that even mean anything? Now, put the colors and the ovals back in -- Wow! There's a pattern! Several, in fact. Clusters. Add some axis labels and a legend, and you are in business.
Even here, though, I cheated a bit -- we started with a graphic/plot. Initially this was just data in a database, or equally likely, rows and columns in a spreadsheet. There's what, 150 or so data points here, each with perhaps a magnitude, an X and a Y coordinate, and some related attribute that eventually got translated into "color." Four columns and 150 rows in a spreadsheet.
So you do the obvious and sort by the attribute in column A. Now what? Do you see clusters of roughly 25 each in some sort of two-dimensional relationship with each other, let alone the "T" pattern? Even with Ted Williams' 20/15 vision, you're not going to get much more insight out of that worksheet.
But when you can apply some analytics to the data, and then consume it with some helpful visual clues such as color, size, shape, shades, labels, axes, legends, and so forth, insight just jumps out at you. Not sure quite how to do that?
No problem -- that's what SAS Visual Analytics has been designed for. It creates a virtual data sandbox that you can query and play around in to determine the appropriate visualization based on the underlying raw data -- a way to reduce the noise. (Visualization? Noise? Sometimes a mixed metaphor actually works out!) Its use of Autocharting displays the most appropriate visual when you drag and drop any combination of categories and measures onto the visualization pane.
Getting back to the matter at hand, the "How Much" question, sometimes you need to discard the outliers and look past the exceptions that give you all that risk trouble and just concentrate on the big picture. Sometimes you need to filter out the noise and get down to fundamentals. Some basic analytical techniques are all you need to turn an otherwise random looking spreadsheet into tangible and actionable information:
- Clustering (as described above). Identify common attributes, and discover areas of critical mass in your production, supplier, or customer data. Hierarchical clustering can generate insight into important subsets of your main groupings that you might have otherwise overlooked.
- Market Basket analysis. Similar to clustering and familiar to most B2C marketing functions, this tells you what products should be bundled or which products sell best to particular customer types.
- Sequence and path analysis. Which shared resource costs or operational activities comprise most of the cost of your best selling products? How do your customers navigate your website, where do they get lost. and when are they most likely to end with a purchase?
- Mapping. Often, simply representing the data spatially can lead to great insights, such as correlations between specific attributes and physical, political, or cultural geographies. Modern epidemiology was born with Dr. Snow's mapping of cholera outbreaks (right) correlated with contaminated community water sources (neighborhood wells/pumps), a profound insight from a simple technique.
Often just being able to redisplay the data in different graphical formats can be extremely helpful, and even more so when the different formats can be displayed side-by-side simultaneously. What was completely obscured in a spreadsheet may become slightly more evident in a bar or pie graph but smack-me-upside-the-head-duh when properly segmented or displayed as a heat map. The basic differences between mean, median, and mode may seem merely academic if the data is presented in nothing more than a table, but could significantly change the decision outcome when the team gets to internalize the graph and what it implies visually.
This is what the combination of analytics and visualization does best -- together they filter out the noise so that you are left with the core concerns. Decision making under uncertainty is tough enough -- no sense wasting time and effort striving for precision and accuracy around the wrong variables or issues. No matter how you choose to mix your metaphors, data visualization turns down the noise so that you can hear yourself think.
Is it just me or did it suddenly get quieter? Did you hear that insight? Did you see that insight?
[If you are interested in further exploring the topic of noise in the decision making process, have some fun with this TED Talk by Daniel Wolpert on "The Real Reason we have Brains".]
This originally appeared in the SAS Blog Valley Alley.