**Newest First**| Oldest First | Threaded View

- by James Connolly, Blogger

- 1/3/2017 1:21:23 PM

Sorry, John. We all use the same technique of finding an example that people can relate to in explaining concepts. I loved the example but did have to laugh. The lemonade stand brought back memories of my own kids and their friends trying to sell lemonade. The only money to be made came from a couple of kind neighbors and the same moms and dads who paid for the ingredients and cups, and the kids drank more lemonade than they sold.

- by John Barnes, Blogger

- 1/2/2017 11:15:28 PM

James,

Re the rate of return on that lemonade stand -- This is what I get for making up my numbers as I go rather than starting from the result I want and working backwards, as was traditional in the business world!

- by James Connolly, Blogger

- 1/2/2017 2:06:02 PM

@John. $50 or $60 bucks a day from a lemonade stand? You can let Mom stay home while you guys take a cab to the store for supplies.

- by John Barnes, Blogger

- 1/2/2017 7:49:16 AM

bkbeverly,

Well, let's work through an example.

Here's us Traditional Statistical Kids and our lemonade stand, and we've done a regression and found that we can sell 50 cups of lemonade at $1 each when the high is 72 Fahrenheit, and an extra 6 cups of lemonade for every degree Fahrenheit above, and the standard deviation of the beta (that is, of that 6 cups) is 1 cup. The weather forecast is accurate with, let's say, a standard deviation of 3 degrees. Say it takes a pound of sugar and ten lemons to make fifty cups of lemonade, and we have to buy sugar by the pound at $2 and lemons by the half-dozen at $2. So the forecast for tomorrow is 91 degrees, and let's assume we are out of supplies. Now Mom is driving us to the store. What should we buy?

Well, by our regression, 19 degrees times 6 cups per degree plus 50 base cups = 164 cups of lemonade. That means 3.28 pounds of sugar and 33 lemons, so we will buy 4 pounds of sugar ($8) and 6 half-dozens of lemons($12) or $20 of supplies, and we expect to pull in $164 in sales, greedy little rascals that we are, and being kids, we'll probably just figure it's $144 profit, not figuring things like opportunity costs or wages to ourselves into it. (This is why economics teachers like lemonade stands as examples).

But down the street at the Entropy Kids Lemonade Stand, they're figuring something else: how certain are they of making their best profit by purchasing 3, 4, or 5 sets of lemonade supplies (i.e. they're dividing the possible outcomes into bins, using those standard deviations; will they need to make 150, 200, or 250 cups of lemonade? We'll assume that if they actually sell 250 cups they'll just bag it and quit early.

Working out the distribution, the cups they might sell break down to:

150 or less, 32.1% chance, expected profit (classically figured) $129.15

151-200, 58.9% chance, expected profit (classically figured) $139.90

201-250, 8.9% chance, expected profit (classically figured) $135.74

Remember, though, that expected profits are actual profits weighted by probability -- and the actual number could range from $42 (if everything goes wrong and they sell only 80 cups but they bought supplies for 250 cups up to $222 (if they bought supplies for and sell exactly 250). The much narrower 150 cup distribution still ranges from $64 to $134, risking a bit less but also foregoing the chance of much higher profits.

The entropy of that distribution is 1.29 bits -- between 1 and 2 coin flips, or about half of rolling a die. In other words, it's quite uncertain.

So what do the Entropy Kids do about that uncertainty?

Well, in the first place, this being a kids' lemonade stand, they're not going to lose much money. They could decide to just swing for the fence and go for the 250 cups, knowing that it's all very uncertain but they're not going to lose.

OR -- they could decide to see if Mom will give them a ride to the grocery store tomorrow morning, when much later weather reports cut the standard deviation for the predicted high to one degree. (That could reduce the entropy by more than half).

OR -- they could look for some kind of emergency supply run option (maybe a big brother who can bicycle to the store for more supplies if need be), and accurately figure out what it would be worth to pay big brother, say, a $3 retainer to be available till 1, and $2 if he actually has to make the run.

OR ... many other options. That, after all, is the point of the new statistics -- making better use of the information you have.

- by John Barnes, Blogger

- 1/2/2017 6:01:15 AM

kq4ym, once again, it's not a guarantee; it's a more accurate estimate of how much we know or don't. If you get an annual PSA screening for prostate cancer, it doesn't guarantee either that you'll know you don't have it (there are false negatives) or that you will know you do (there are false positives), but it does mean you will be much less uncertain -- and that uncertainty has numbers attached that you can use to make better decisions.

- by John Barnes, Blogger

- 1/2/2017 5:56:31 AM

Zimana, it may be counterintuitive, but you've intuited a critically important point. Entropy analysis is tremendously useful in answering the question "If we spend the time and money to find that out, how much better off will we be? How much closer will we be to making that theoretical ideal, the decision with perfect information, than we are right now?"

- by John Barnes, Blogger

- 1/2/2017 5:54:08 AM

kq4ym,

Well, the quick answer is yes, quantity of data still matters quite a bit and so does sample size. The new way has not overthrown the old, just enhanced and modified it.

Let's run an example. On a fair die, each face has a 1/6 chance of coming up, so the six possibilities are all equal, and the entropy is 6*(1/6)*log_{2}(1/6)=log_{2}(1/6)=2.585.

If you **knew for sure (uncertainty =0) that a die was loaded** so that 3 came up 30% of the time (about twice as often as it would on a fair die), the entropy of each roll would be smaller: .3*log_{2}(.3)+5*.14*log_{2}(.14) = 2.507. (Note the entropy does not go to zero because that loaded dice is still giving all the possible results, just not in equal numbers).

Let's suppose you begin with an estimate that there's a 50% chance the guy you're rolling dice with is a crook. He's taking advantage of knowing of that entropy difference (since he knows the chances are he'll get about twice as many 3s as he should). If so you should either quit playing with him or start betting the way he does (if there are other players in the game and you're a crook yourself).

The entropy of the question "is this guy a crook" is starting out as a coin flip's entropy -- 1 bit.

Now, suppose you get to test-roll that die. What does it mean if it comes up with a 3 on one test roll? In classic probability, using the binomial distribution for both possibilites, it turns out you now have about a 64% chance that you're playing with a crook. The decrease in entropy is from 1 bit (coin flip) to .94 bits (only 6% better than flipping a coin). Notice that you might be quite tempted in classic probability to say, okay, good enough, I'm rolling with a crook -- but the entropy tells you that you haven't learned much from one roll.

Now what about larger numbers of rolls? How much can you learn about the die from more repeated trials?

Suppose you get these results, where you roll a given number of times and get a specified number of 3s:

You roll 10 times and get 4 threes. Classic probability that it's loaded: 79%. Entropy: .748 bits

You roll 50 times and get 14 threes. Classic prob: 87%. Entropy: .542 bits

You roll 100 times and get 31 threes. Classic probability: 99.7%. Entropy: .021 bits

You roll 1000 times and get 298 threes. Classic: better chances than the sun coming up in the morning. Entropy: For practical purposes, zero.

Now, how is entropy different and possibly better? The entropy measures, not the chances, but how much doubt (uncertainty) remains. Suppose it costs you a penny a roll to conduct your tests. Then a single test for one cent produces almost no change in what you actually know (entropy only goes from 1 to 0.94), and that's not a penny well spent. On the other hand, for a dime, you can get rid of .252 bits of entropy; for a dollar, get rid of 0.979 bits, i.e. almost the whole thing. And clearly ten dollars -- a thousand rolls -- gains almost no more information than 100 did. So entropy makes it possible to put a price on "how sure do you want to be" and also, rather than just a best guess at the odds, tells you how good the guess is. For many decisions, that's much more important.

- by John Barnes, Blogger

- 1/2/2017 4:14:35 AM

Louis,

It's not necessarily the quantity of data you need to get; it's the distribution of probability. Let me give you a super-simple example: if you bet on a coin flip with me, and you know the coin is fair, then you should take the bet whenever I offer you something better than 50-50 and refuse it whenever I offer you 50-50 or worse. (Let's imagine I'm running the world's dumbest casino here, I guess), and bet either heads or tails as I offer.

Now, let's say there's a slight chance that the coin is a little bit loaded -- say you know it could actually be 51-49 heads -- and if you knew that for sure, you'd only bet when I offered 50-50 for heads or 51-49 for tails or better.

Entropy allows you to factor in the uncertainty of which of the two possible game you're playing, and to determine whether it would be worth it, for example, to pay $5.00 to be allowed to test that coin by flipping it 1000 times (which would give you a better estimate of the probability that it was loaded, but not a definite answer as to whether it was).

Another example that many data science folks use: if you know the Monty Hall Problem, then the reason you should always change your choice is because when Monty opened that door and showed you the goat, he made the other door you didn't choose the most likely one, and decreased the entropy (uncertainty) from an initial 1.585 bits to a considerably better 0.918 bits. (42% decrease in entropy, which is pretty much a jackpot).

So, in short, it's not about the quality of the data directly. It's about the quality of information in the distribution. The quality of information in the distribution depends in part on the quality of the data, in part on the quantity of the data, in part on the decisions you made in assigning the data to categories, and in part on the quality of the categories themselves. Which is why entropy analyses can so often get much better results from the same data.

- by John Barnes, Blogger

- 1/2/2017 3:49:22 AM

Louis, good to see you again too.

Looking at who some of Trump's analysts were -- and he had a couple very good practitioners of the new stats, for once he wasn't lying or exaggerating about the quality of his people (quite possibly the reason he didn't talk about them much) -- here's my * guess*. Emphasis on guess (I.e. I don't really know, but this is what I'd expect from who was doing it):

Instead of figuring expected values of vote totals for each candidate in each state (the old way), which would have simply told them that they could win but it wasn't likely, they used their poll data to calculate the uncertainty of the result in the very small number of key states (quite possibly just the six states in the Blue Wall that turned into the Red Sieve).

Then in the most-uncertain states, they did breakdowns to identify most uncertain areas (counties, cities, whatever unit worked), and looked at the ones where they could decrease the uncertainty of a large turnout for Trump. In effect, they tilted the distribution (or loaded the dice, though I don't like that metaphor since it implies cheating, and they didn't cheat; they just used information better).

(Eg., with made up numbers: so in Cheese County Wisconsin, let us say, they noted that the entropy was high and the most likely outcome was a close race, and they concentrated specific ads and messaging to increase turnout in the towns that already liked Trump. Hillary's people were polling the same places but all they saw was about the same for v. against poll result and mostly they just used historical turnout levels because those had usually been accurate enough. Trump's people not only saw what the real situation was more accurately, they could also do constant monitoring of how it changed and adjust their efforts accordingly.)

Moneyball example: Oakland mostly gave up the sacrifice bunt because the out it cost decreased the other team's uncertainty (entropy) of getting through an inning without giving up a run more than it decreased Oakland's uncertainty of getting at least one run. Because other teams only figured expected value -- trading a certain out for an increased probability of a run -- they kept sacrifice bunting. But Oakland was right, because the entropy disadvantage they were looking at measured, in effect, not just the effect on that one play or on that one base runner, but on all the things that might happen later in the inning -- and while it helped the one high-probability case quite a bit, in total, it hurt almost all the numerous other low-probability cases slightly, and that added up to a net loss.

In my current day job in sales, here's a way I use the idea: objectively there are a very small number of prime prospects, and traditional probability would tell me to concentrate fire on the very best. But entropy would tell me that once they're a prime prospect (due to my past efforts) the odds of eventually closing them can only be improved a little. Ditto, the cases where I've quickly established almost-zero chance; effort can only raise them to slightly-better-almost-zero. Instead, I try to concentrate on the numerous cases where the likelihood is vague -- because one more call will either drastically raise or lower their probability and move them into one of the "don't work on this one" categories.

Hope that helps a bit. No, this stuff is not intuitive. On to other questions, good to see you again!

- by bkbeverly, Data Doctor

- 12/29/2016 12:16:44 PM

John,

I need some clarification please on the lemonade stand example.

*To use that economists' favorite example, the kid's lemonade stand, a very quantitative kid **using probability might graph daily high temperature against glasses of lemonade sold and improve profits by following the weather forecast. But a modern entropy based kid would instead compute the entropy (which is the uncertainty) of needing extra lemons and sugar, and decide when to make that decision, and over a summer that would reduce unnecessary purchases by much more predictable amount. *

How would you compute the uncertainty of needing extra lemons and sugar? Also to compare the probability of making a profit (revenue earned above operating expenses) to the uncertainty of needing extra lemons and sugar (which are operating expenses) - isn't that an imbalanced comparison? The concept sounds interesting and somewhat like 'backward elimination in regression analyses', but I am having some problems in grasping this. Any additional light that you can shed would be great please.