How to Lie with Statistics

Critical Thinking
How to Lie With Statistics

Introduction

Example

Canada: A Dangerous Place to Live. Despite the best efforts of police departments across the country, the number of murders commited in Canada has increased 6% since 1996.

Canada: A Safe Place to Live. The Canadian gun registry appears to have paid off. The Canadian murder rate has decreased by 4% since 1996, about the time short barrelled handgun registration started.

Which is true? Both are based on real data. population homicides1 homicides2

Question: How many people die in Toronto each year?

Sample Bias

Determine how many of each color marbles in a jar by sampling. Jar contains 1000 marbles, some red, some yellow, some green, some blue. I want to know how many of each color are in the jar. I select some and assuming that the sample is indicative of the whole, I use the fraction in the sample and extrapolate to the whole.

Sample Bias occurs when the sampling method leads to samples which are not indicative of the whole. For example, suppose that blue marbles are much larger than other marbles. A 'random' sample obtained by putting your hand in the jar a grabbing what you come in contact with is likely to favour the blue marbles giving the impression that they represent a larger fraction of the population.

Example
Number of women in CS obtained via a poll of CSC300. Bias: 3rd year represents all years? CSC300 may draw more women due to the subject matter?
Example
Predict the outcome of an election based on polling, from the wikipedia article...

A classic example of a biased sample and the misleading results it produced occurred in 1936. In the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected by George Gallup's organization successfully predicted the result, leading to the popularity of the Gallup poll.

Choosing an Average
Different 'averages'. You have data x1, ..., xn
Example
Example: McArnolds salarys Histogram of salaries for McArnolds Inc. 450,000 X 150,000 X 100,000 XX 57,000 X mean 50,000 XXX 37,000 XXXX 30,000 X median 20,000 XXXXXXXXXXXX mode Which statistic do I report? Depends on who my audience is. Say I give everyone a 10% raise. Example: McArnolds Raises Histogram of raises for McArnolds Inc. 45,000 X 15,000 X 10,000 XX 5,700 X mean 5,000 XXX 3,700 XXXX 3,000 X median 2,000 XXXXXXXXXXXX mode Report to employees: Company is very generous, having given out an average salary increase of 5,700. Report to share holders: Company is watching bottom line carefully, having given out an average salary increase of 3,000.
When does it NOT matter which average you choose? Essentially when you have a normal 'normal' distribution...
Example
Does the following implication actually hold? Helping families ---------------- Part of helping hard-working Canadians, of course, is keeping their taxes low as they try to make ends meet. This commitment explains why the average family of four now receives almost $3,100 in extra tax savings thanks to the numerous tax reduction measures introduced by this Government, and why the federal tax burden for all Canadians is now the lowest it's been in 50 years. From the June 6, 2011 Budget Speech. What do you need to know to understand this?
Little figures that are not there
Information left out of the statistic may be important.
Example
Great new CS initiatives in 2007 boost Women in CSC108 enrollment by 50%
  • Could be that enrollment in CS boosted by 50% between 2006 and 2007 (very dishonest).
  • If 4/100 women in CSC108 in 2006 and 6/100 women in CS108 in 2007, is this significant? In this case, the size of the event space is small compared with the sample space. For a discrete distribution, you expect large variance.
Graphs
Example




Example
In the News: The Schiavo Case. A sample of the population was asked
Do you agree with the court's decision to remove the feeding tube?
The results were recorded according to people's political affiliation.

Updated graphic after complaints


Cause and Effect
Beware of conclusions drawn from statistics.
Example
82% of prisioners are highschool dropouts. Conclusion: Dropout of highschool implies you will end up in prision!

Also, from the same article...

Dropping out, in turn, causes other secondary, indirect problems: Public Assistance. High school dropouts are also more likely to receive public assistance than high school graduates who do not go on to college. In fact, one national study noted that dropouts comprise nearly half of the heads of households on welfare. Question: Is the last line evidence of the first? Exercise: Construct an example where high school drop outs are less likely to receive public assistance, and yet dropouts comprise nearly half of the heads of households on welfare.

Example
Lifespan in Toronto is 81 years, in suburbs is 83 years. Conslusion: It is uhealthy to live in toronto.
Possibilities:
Misunderstanding of statistics
coincidence
Example
Lottery/gym story. In 100 coin flips, how likely is a run of 10 heads? Considering the number of murders in Toronto in 2007, is it likely that there will be 5 events in one week? Toronto Homicides in 2007, By the numbers: Toronto's murder rate drops sharply in 2011
false positives, false negatives in medical tests
Example (Not based on a real AIDS test)
Imagine that you have an test for AIDS that is 99.9% accurate. Positives are 99.9% accurate, Negatives are 99.9% accurate. 56,000-58,000 people in canada with AIDs in 2008. What does it mean if someone test positive for aids? 56,000/30,000,000 people in Canada with aids Approximately .001 * 30,000,000 = 30,000 people in Canada will have false positives. Now given that the person tested positive, how likely is it that they dont have aids? 30,000/86,000 = .35 approximately 1 in 3 people that test positive do not have aids
Something to think about
California Marijuana Decriminalization Drops Youth Crime Rate To Record Low: Study What does it mean?
References
HUFF, D. (1954). How to Lie with Statistics (illust. I. Geis). Norton, New York.
Questions and Answers
Question:
Also, I was thinking about the AIDS example and have discovered something that doesn't make sense. The "true" accuracy of the test (approximately 1 in 3 people that test positive do not have aids) is calculated using the population of people that have AIDS and the total population. That means that the accuracy of the test CHANGES in areas where more or less people have AIDS. This doesn't make sense. If it is the same test, then the accuracy should be the same. Example: Your example said 56,000 people with AIDS in 2008. Say that due to a manufacturing error in contraceptives, 100,000 end up with AIDS in 2009. Doing the same math yields these results: 100,000/30,000,000 people in Canada with aids Approximately .001 * 30,000,000 = 30,000 people in Canada will have false positives. Now given that the person tested positive, how likely is it that they dont have aids? 30,000/130,000 = .23 approximately 1 in 4 people that test positive do not have aids! The numbers would also change in populations where less people have AIDS. Say a cure is developed and only 20,000 people have AIDS in 2009: 20,000/30,000,000 people in Canada with aids Approximately .001 * 30,000,000 = 30,000 people in Canada will have false positives. Now given that the person tested positive, how likely is it that they dont have aids? 30,000/50,000 = .6 approximately 3 in 5 people that test positive do not have aids! If this is the same test, then the accuracy of the test shouldn't change like this... I'm I being obnoxious?
Answer:

Yes, I am glad you are being obnoxious, and yes, this is where the intuition and the reality do not match. See you thought about it! For rare diseases, a positive test is more likely the result of a false positive. For common diseases, a positive test is more likely the result of a real carrier.

In fact you do understand. Thats why you are surprised.