Sunday, February 3, 2013

Lies, Damn Lies and Statistics


This week we started talking about probability. Well, fellow mathletes, I see an opening to geek out, and completely blow your minds!

We all think we have a sense of probability. If I flip a coin, there are equal odds for each side. If I roll a die, I have a 1 in 6 chance of each number. Easy right?

What if I flipped a coin 10 times, getting heads 10 times in a row (a 1 in 1024 chance, by the way)? Then, I asked you what the odds are of getting heads on the next flip. The answer: 50% chance. This is a situation that trips up a lot of people, because it seems crazy to see such an improbable case. However, if you’ve already seen the crazy, improbable case, then this doesn’t control future probability. There is a big difference between the above scenario and just asking the probability of 11 heads in a row from the start.

We started to see this in the decision trees. And, I really like the layout conceptually, because it avoids the unintuitive thinking. If I take option A, then the overall expected value can completely change, because I can throw away all of option B. In the coin flip problem, this is like having a huge decision tree and taking option A 10 times, leaving just a 50/50 chance of getting one more option A.

Decision Tree of 2 consecutive coin flips


There is a reason I geek out on statistics a little- it’s probably (see what I did there?) the most useful branch of mathematics. We run into statistics every day without noticing. That’s because statistics is dealing with unknowns, of which we’re surrounded by. In contrast, when was the last time you can think of using calculus in any meaningful way? I can’t, and I’m a freakin engineer! Calculus has its place for sure, but it’s far more successful in just getting high school students to hate math. As it turns out, I’m not alone, there’s even a TED talk on this: http://www.ted.com/talks/arthur_benjamin_s_formula_for_changing_math_education.html

But, I digress. Back to my point on unintuitive statistics…

Ok, statistics is at the heart of gambling, so I will bet that there are two people with the same birthday in COR520 (students and faculty). There are 52 of us by my count. Are you up to it? What do you think the odds are of this? If you’re up to it, please stop reading and email me now.

To pass the time, here is a video of dogs playing in the snow.



Welcome back, I graciously accept your bet.  I want to say up front that there is certainly a chance that no 2 people in class have the same birthday. But, what are the odds? The first time I was given this puzzle, I thought, “easy, it’s just 52 people divided by 365 days.” Or, there is a roughly 14% chance that we have a duplicate birthday. As it turns out, there is roughly a 98% chance of a duplicate birthday.

I don’t know about you, but my mind was blown when hearing that. How can that be when we don’t even come close to filling every day of the year with birthdays?!

Consider this. Assume that everyone has a unique birthday. (Yay, we all get our own special day!) Next quarter we get an additional TA. Besides a collective sigh of relief from the TAs, how would that affect the duplicate birthday problem? This new TA, let’s call him Gary, would have to avoid 52 birthdays for the unique trend to continue, so he has a 14% chance of ruining someone’s day. But that’s not a 1/365 chance; it’s way more. Because you have to avoid all the other birthdays to remain unique, the odds of everyone accomplishing this drop very fast as you add people. 

Also, keep in mind that I didn’t say who would have the duplicate birthday, just that there are 2 somewhere. This is the difference between saying, “I will win the lottery” and “Someone will win the lottery”. Very different odds. Unfortunately.

If you managed to stick with me through all that, I’m truly grateful. I know it’s dry, technical crap that is no fun for anyone. But, in class we are applying statistics to some very real world problems. The point of this post wasn’t to drown anyone in a bunch of math theory, it was to point out that statistics can be very misleading, where subtle differences in scenarios leading to wildly different outcomes.

In class Wednesday, we talked about a problem where we were given sales amounts and probabilities. We made a big and potentially dangerous assumption in this problem: sales year-to-year are completely independent (i.e. a given year has no probability determined by the previous year). If we consider the sales numbers year over year as having a correlation, this problem changes dramatically, and is far more complex. Unfortunately for the math haters, this is a much more accurate statistical model.