Regression methods are so familiar, it's easy to overlook vital information that hides in plain sight, so always give some thought to regression to the mean.
Because regression is one of the most common tools in business analytics, managers usually just glance at the probability for each F-statistic, and the estimate and standard deviation for each coefficient and intercept, and move on. But hidden in the word "regression" is a clue to other information that may be right there in front of you.
Regression is called so because of a realization that came to Francis Galton, a Victorian polymath, who did much of the early development work when graphing children's heights against their parents' heights. Although parental height was a good predictor of a child's height, there was a strong tendency for very tall parents to have children shorter than themselves, and for very short parents to have children taller than themselves. He called this a regression line, because in repeated samples from the same population, the outliers (points extremely far from the estimated regression line) tended to regress -- move closer to that line -- on the next repetition of the test.
The normal (bell-curve) distribution of errors causes this (in statistics, errors are not mistakes, but simply differences between predicted and observed values). A normal distribution of errors means that small errors are common and large ones are rare. When one of the few large errors happens to occur on an extreme value (i.e., a value close to the minimum or maximum), you get an outlier.
The chances of an extreme value are small, the chances of a big error are small, and the chances of both together are very small, which is why outliers are scarce.
Regression to the mean happens naturally, because the odds of being an outlier twice in a row are even lower. The highest probability is that the next time the process is repeated, the cases that were outliers before will be closer to the predicted line. That's the meaning of regression to the mean: today's star (or biggest loser), tomorrow's Average Joe.
Regression to the mean explains the "sophomore slump" in sports stars: A brilliant rookie was also unusually lucky, racking up great stats his first year, but unlikely to do it again in the second season. It's a factor in the sales manager's frustration with the salesperson who sells like mad for his or her first period, then turns average. And it explains why some managers become addicted to penalties and negatives: Regression to the mean causes top people to deteriorate and poor ones to improve drastically, so it looks like the bonuses to the top people are failing and the punishments for the bottom people are succeeding.
In analytics-driven management, if you are alert for regression to the mean, you can avoid being fooled, spot unusual problems and opportunities, and brace for some likely trouble:
- Not being fooled: It's important to sort out talent from luck. Before launching an incentive program, look to see if the "star" and the "goat" performers regress to the mean over time. If they do, providing an incentive will just lower morale.
- Unusual problems and opportunities: If most of the population regresses to the mean, but you have a few consistent stars or duds, you're almost certainly looking at a black swan that warrants further investigation. Somebody has a skill they're not sharing, or a few people have a disastrous procedure you want to warn against in training.
- Bracing for trouble: Some businesses have sharply spiked distributions for sales, supply costs, orders, etc. One customer might be a third of all orders, one field office might be handling half of all customer complaints, and so on. If that is the case, and you have observed regression to the mean in the overall data, then sometime in the future, you're going to get a perfect storm or a jackpot, because one of the biggest sources of blessings or troubles will turn outlier. Then, numbers that have been stable, steady, predictable, and generally comfy for years or decades will lurch into completely new territory.
Understanding regression to the mean can be a quick pathway to sort out the accidents from the permanent changes, figure out where the "100-year floods" are bound to hit eventually, and perhaps even have a plan in place for an overabundance of blessings.
Watch your outliers. Regression to the mean happens -- in fact, the basic technique is named after it. If you keep that possibility in mind, you've positioned yourself to be the good kind of outlier.