John Barnes

Don't You Regress With Your Regressions!

NO RATINGS
1 saves
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
John Barnes
User Rank
Blogger
Re: Regression to the mean: Average versus Outlier
John Barnes   12/9/2012 5:16:14 PM
NO RATINGS
Louis, I'd phrase it just a little differently, that it's a matter of what field you're in.  If you're a sandwich maker, you can be right there on that regression line, or even a bit below it, and still work.  But if you're going to be a pro tennis player, you need to be an outlier -- way over to the right and consistently above that regression line.  And if you want to be a pop star ... all that, plus being an extreme outlier (i.e. VERY lucky).


Something I just noticed thanks to this excellent blog post (which I found in David Brin's even-more-excellenter blog post) is that although the black swan metaphor is catchy, the book itself suffers from the problem that the author thinks that because you can't predict the specifics, you'll always be surprised by rare events.  But often just knowing that the surprise is possible carries a lot of information; we know a massive earthquake, bigger than the California Big One, will happen someday with an epicenter somewhere around New Madrid, Missouri; we know that the use of nuclear EMP to disable machinery around a wide area, which is an extremely likely thing for a government or terrorist organization to do in the next fifty years,  also means that an enormous number of financial records (such as bank accounts) could be wiped; we know that if a late-season hurricane, a high plains blizzard, and a large arctic air mass all converge on the East Coast megalopolis .... oh, wait.

Before the Panic of 1907 it was well-known that just a few insurance companies covered the city of San Francisco, and their stock was mostly owned by a few large banks in New York, which in turn were linchpins of that city's (and therefore the country's) banking system (back before the Federal Reserve).  How likely was it that suddenly all those insurance companies would have to pay up on all their policies all at once?  (Hint: what happened in San Francisco in 1906?)


The extreme outliers are important and worthy of our attention, even if it's just to confirm they are flukes.

Louis Watson
User Rank
Blogger
Regression to the mean: Average versus Outlier
Louis Watson   12/9/2012 4:58:51 PM
NO RATINGS
Thanks John, for the refresher on Regression, I do think that because it is not a sexy tool - it's information often get's ignored.  But I like how you have shown how regression to the mean explains a whole host of occurrences ( I really like the sophomore slump example). 

It seems like everyone is destine to be average except for the occasional outlier, which is I guess how it has always been - so the question is how to become and stay an outlier against the odds ?

Very interesting insight and food for thought - thanks again John.

John Barnes
User Rank
Blogger
Re: Investors too
John Barnes   12/4/2012 12:59:20 AM
NO RATINGS
Seth -- that's very true, though not itself an example of regression to the mean.  But an analyst would still use regression to the mean as an alternate hypothesis, to demonstrate the reality of fatigue.  Here's how the analyst would reason it out:

In either a genuine regression-to-the-mean case, or in the case of fatigue,  nearly all top performers in the first period evaluated would fall down the ladder in some later period, but in a classic regression to the mean, the fall would be overwhelmingly likely to happen between period 1 and period 2.  If there were any surviving top performers who had top-performed in both period 1 and period 2, then the fall would be overwhelmingly likely (and with the same probability) between 2 and 3, and so on; a high probability of a top performer taking a fall between any two periods, and that probability would be fairly constant.

But in fatigue, you'd see the probability of a fall start out fairly low and rise with time, probably nonlinearly.  You could quickly confirm this by plotting and/or regressing your residuals (statistical errors) against time.  Comparing the errors of the two models would quickly reveal that fatigue explained things much better.

So a reasonably sharp analyst, put on the problem, could tell the manager that this was not regression to the mean (which fundamentally can't be fixed) but a situation where the right resources applied correctly (sabbaticals, incentives) could make everyone better off. 

That's the beauty of always considering and testing for regression to the mean; whether you find it or not, it always puts you a long step closer to understanding the situation.

SethBreedlove
User Rank
Data Doctor
Re: Investors too
SethBreedlove   12/3/2012 8:54:35 PM
NO RATINGS
When it comes to employee performance, fatigue must be countered in. Very few people can be top performers for years.   I had problems with one boss when I became less productive, yet still the top producer.  You can drive a Ferrari at top speed only for so long before it breaks down.

John Barnes
User Rank
Blogger
Re: Investors too
John Barnes   12/3/2012 4:48:14 PM
NO RATINGS
PredictableChaos:

Absolutely right, and very wise to boot!  In so many ways the most important thing we learn from regression to the mean is not to give ourselves too much credit.

PredictableChaos
User Rank
Data Doctor
Investors too
PredictableChaos   12/3/2012 4:41:39 PM
NO RATINGS
 

This happens with investors and financial advisors too - one hot streak early in a career and you're a legend in your own mind.  Which can lead to overconfidence and all kinds of nutty behavior.

So I guess I should be happy that my first stock investments were more like touching a hot stove.  I learned humility much faster than if I'd picked a big winner.

John Barnes
User Rank
Blogger
Re: I'm regressing...but I digress
John Barnes   12/3/2012 2:50:29 PM
NO RATINGS
The strange this is that some of the hotshots created by regression to the mean believe their own hype; they think there must be something they're doing that's causing the streaks.  Addicted gamblers are notoriously that way; they think they had mojo that accounted for that one wonderful time early in their career that everything went so well, and they can spend (and destroy) the rest of their lives trying to get that illusory mojo back.  But some businesses, occupations, and situations are just streaky by nature (Claude Shannon, back in the 1940s, worked out why in a purely random process, streakiness is more likely than steadiness) and quiet, persistent, do-it-right-every-time effort only pays out on the average over the very long run, so more "streak addicts" are born every year -- and end up wasting their lives chasing after the "streak magic" that doesn't exist. 

Callmebob
User Rank
Master Analyst
I'm regressing...but I digress
Callmebob   12/3/2012 2:44:41 PM
NO RATINGS
John, on the money with your sales and sports stars slumping or falling off a cliff comparisons. I've been involved in sales for years, many of them as a sales manager. I've always looked for performance consistency over hotshots and spikes. Why? The mercurial hotshots or rainmakers are prone to be hot and cold, not steady, even dipping below the mean. And as a sales manager I would never completely tie my wagon to a star sales person or customer for that matter.

There was one guy I know (used to work with him) who was one of these slam-bam type of sales people who would outshine everyone...but only for brief periods and then he'd sink and his sales would dip. That caused management to start asking questions about commitment andeffort. This guy was always ready and would typically respond by not pumping up his effort but quickly bailing out and going to a new company where they were impressed with his hot sales record. He'd typically last a year at a company and leave before the annual performance review.

BethSchultz
User Rank
Blogger
Re: Good reminder
BethSchultz   12/3/2012 2:24:36 PM
NO RATINGS
@John, the freelancer's bane! (Which I know well from a previous life, although I didn't know it had anything to do with regression to the mean.) 

 

John Barnes
User Rank
Blogger
Re: Good reminder
John Barnes   12/3/2012 2:17:30 PM
NO RATINGS
Beth, I learned it the hard way, from someone who was not a better analyst than me but who paid attention to the analytics at a time when I didn't.  She pointed to the variability in payment times and amounts from various clients, and said, "John, right here is where you'll go broke.  You can afford to have all this wobble in the behavior of the minor clients, and let it even out, but the first time a major client does that, you'll be broke."


Unfortunately she was absolutely right; one day the single client that was 40% of my income turned into an outlier for payment time and for commissioning new work.  It's not the average wave, but the biggest one, that can sink you.


And it was all perfectly predictable from the fact that clients in that business showed a distinct pattern of regressing to the mean.  The rest of you don't need to put your hand on the hot stove to find out it's a bad idea; just sniff my charred fingers!

Page 1 / 2   >   >>
Information Resources
More Blogs from John Barnes
Analysts would do well to get out of the rut of using linear regressions by default.
Sometimes your results require accuracy and sometimes precision. Knowing the difference matters.
Rule-based behavior models offer a good alternative to guesswork and folk wisdom.
Keeping these three words, often jumbled in business discourse, separate and precise can help you be a better decision maker.
Radio Show
Radio Shows
UPCOMING
James M. Connolly
How to Hire Great Analytics Talent


4/23/2015   REGISTER   0
UPCOMING
James M. Connolly
Live Interviews From SAS Global Forum


4/28/2015   REGISTER   0
ARCHIVE
James M. Connolly
Sports Analytics Mean Fun and Business


3/24/2015  LISTEN   4
ARCHIVE
James M. Connolly
Secure Your Big Data in the Cloud


2/26/2015  LISTEN   114
ARCHIVE
James M. Connolly
Make It Big As a Data Scientist in 2015


2/11/2015  LISTEN   106
ARCHIVE
James M. Connolly
Big Data, Decisions & the Simulated Experience


2/3/2015  LISTEN   87
ARCHIVE
James M. Connolly
A Chat About Big Data, Machine Learning & Value


1/15/2015  LISTEN   125
ARCHIVE
Curtis Franklin Jr.
An Infrastructure for Analytics


12/18/2014  LISTEN   63
ARCHIVE
James M. Connolly
Prepare for the Internet of Things Data Blitz


12/16/2014  LISTEN   50
ARCHIVE
James M. Connolly
How Mature Is Your Analytics Program?


11/18/2014  LISTEN   148
ARCHIVE
James M. Connolly
Drive Big Decisions Using Data & Analytics


11/10/2014  LISTEN   73
ARCHIVE
Beth Schultz
Data Science & the Data-Driven Culture


10/30/2014  LISTEN   134
Information Resources
Quick Poll
Quick Poll
Infographic
Infographic
It Pays to Keep Insurance Fraud in Check
While 97% of insurers say that insurance fraud has increased or remained the same in the past two years, most of those companies report benefits from anti-fraud technology in limiting the impact of fraud, including higher quality referrals, the ability to uncover organized fraud, and improve efficiency for investigators.
Follow us on Twitter
Follow us on Twitter
Like us on Facebook
Like us on Facebook
About Us  |  Contact Us  |  Help  |  Register  |  Twitter  |  Facebook  |  RSS