Margin of Error

With the pending USA election, the news is awash with poll results showing a candidate’s preference by voters. And when these results are presented, there is usually a caveat that “however, these results are within the margin of error” which actually makes the results a bit less conclusive. Why is that?

Without going through the plethora of maths that arrives at what follows, let me explain.

If we are trying to determine a parameter of a population (like the percentage of people that prefer a candidate), we need to ask everyone in a population in order to know the answer exactly. This is impossible in many situations, especially in the USA where all the people who will vote cannot be asked the question. So a sample of voters must be used. Now there are a lot of things to be considered to make sure that the sample used is truly random (that is, not biased), but let’s assume going forward, that the samples used are random.

First, without the math, for large samples, the distribution of the parameter being measured is approximately normal. This is fancy statistical wording that means the values one gets taking sample after sample will follow a bell curve:

This curve is adjusted so that the probability of the parameter of interest between two values is the area under the curve. This means that the area under the entire curve must be 100%. So the probability that the parameter is between a and b, based on the sample, is the shaded area below:

If this is 68% of the total area, then that is the probability that the parameter being measured is between a and b.

Now let’s get to the current scenario. Suppose 1000 people are surveyed and 52% prefer candidate A and 48% prefer candidate B. Let’s look at the associated bell curve for candidate A:

A lot of math is involved here and a lot of assumptions (though they are reasonable). Notice that the curve is centered at the sample result of 52% (0.52 is the decimal equivalent). The range 0.49 to 0. 55, which is 0.52 – 0.03 and 0.52 + 0.03, are the numbers that include 95% of the area (that is probability). Without going through a lot of theory here, this range of numbers is the 95% confidence interval for this sample. So a statistician can say “based on this sample, I am 95% confident that the true percentage of all voters who support candidate A is between 49% and 55%”. This means that based on this sample, the true preference for candidate A can be as low as 49%. The number 0.03 which is added and subtracted from the sample result is called the margin of error. This 95% confidence interval is the most common one used.

Now let’s look at the bell curve for candidate B and its associated confidence interval:

Notice that the 95% confidence interval for this result is 45% to 51%. That is , based on this sample, we can be 95% confident that the true percentage of all voters who support candidate B is between 45% and 51%. This means that the true preference for candidate B can be as high as 51%. And this is higher than the possible low preference of candidate A at 49%. That means that even though the sample shows that candidate A is preferred, the difference between the two values are not significant enough to make the statement that candidate A is truly preferred at the 95% confidence interval. In other words, the result is within the margin of error.

Now let’s say that the survey result was that candidate A was preferred at 55% and candidate B at 45%. The confidence interval of candidate A’s bell curve would be 52% to 58%. Candidate B’s confidence interval would be 42% to 48%. So based on this sample, the highest that the true preference of candidate B is 48% and the lowest preference of candidate A would be 52%. There is no overlap here so this would be significant enough to say that all voters do prefer candidate A. That is, the result is outside the margin of error. And when you see results with that statement, that is much better for candidate A.

Statistics – Combinations (Selections)

In my last post, I described how you can find all the ways to arrange x things from a group of n things. Here, order matters, and the equation to calculate this is

\[P\left(n,x\right)=\frac{n!}{\left(n-x\right)!}\]

If this looks strange, please read my last post. Now let’s talk about counting things where order does not matter, for example, picking a team of players.

The math term for this is combinations. Let’s introduce this with an example. How may ways can you arrange the letters A, B, and C? From my last post, you know that this is 3! = 6. ABC and CBA are two different arrangements. Now how many ways can you select all 3 letters? Well, there is only one way that can be done. ABC and CBA are the same selection so are only counted once. Notice that for a given n and x, there are fewer selections than arrangements. In this example, there are 3! = 6 times more arrangements than selections.

Now let’s modify this example. Suppose we want to select 3 letters from ABCDE. For any 3 of the letters chosen, there will be 3! times more arrangements than selections, which means that if we use the permutation formula above to answer this question, the answer would be 3! times too large. Generalising this, there are x! times more arrangements than selections for a given n and x. This allows us to modify the formula above by dividing it by x! to get the combinations formula

\[C\left(n,x\right)=nCx=\left(\begin{matrix}n\\x\\\end{matrix}\right)=\frac{n!}{x!\left(n-x\right)!}\]

The left side are some of the different notations used and the right side is the actual formula. As with permutations, you can use a CAS calculator to do this calculation with the nCr function. Selecting 3 letters from 5, there are

\[C\left(5,\ 3\right)=\frac{5!}{3!\left(5-3\right)!}=10\]

ways to do that. This would also answer the questions: how many ways can you select a team of 3 people from 5 people or how many 3-card hands can be dealt from a deck of 5 cards?

Speaking of cards, how many 5-card poker hands can be dealt from a standard deck of 52 cards?

\[C\left(52,\ 5\right)=\frac{52!}{5!\left(52-5\right)!}=2,598,960\]

Now let’s put this in practice. There are many lotto games around based on picking 6 numbers out of 45. Let’s first calculate how many ways you can select 6 numbers out of 45:

\[C\left(45,\ 6\right)=\frac{45!}{6!\left(45-6\right)!}=8,145,060\]

From my post on basic probability, the probability of your lotto ticket with a single set of 6 numbers winning is

\[\text{probabilty of winning}=\frac{1}{8145060}=0.000000123=0.0000123\text{%}\]

Now if you buy a block of 50 numbers, how much does that improve your chances of winning? This is a binomial distribution problem which is beyond the scope of this post, but to calculate that, it uses the probability calculated above to get 0.000614% chance that at least one of the numbers wins. That’s a 1 out of 162,902 chances to win with a 50 pick lotto card. In Australia, there is a 1 in 12,000 chance of being hit by lightning. Just make sure you’re not standing outside when you buy your ticket.

Statistics – Permutations (Arrangements)

I will discuss counting two types of picking a group of items from a large number of items. These two types are called permutations (also called arrangements) and combinations (also called selections).

Combinations are when the order of the picking does not matter. For example, when picking 5 cards from a 52 card deck, the order does not matter: Ace, 2, 3, 4, 5 is the same hand as 5, 4, 3, 2, Ace (assuming the suits are the same). Or another example is how many 5 player teams can be made from 30 people. I will discuss combinations in a subsequent post. This post is about permutations, where the order of things picked does matter.

An example of a permutation problem is how many ways can you arrange 5 guests on a table from a group of 50 people. Here, order matters: Adam, Betty, Charlie, David, Eddie arranged in that order is different from Eddie, David, Charlie, Betty, Adam.

Let’s look at a simple example and extrapolate from that.

From 5 people, how many ways can we seat 3 of them? There are 5 ways to pick the first person. Now there are only 4 people left, so there are 4 ways to pick the next person. Now there are 3 people left so we only have 3 ways to pick the last person. So the number of ways is the 5 ways to pick the first times the 4 ways to pick the second times the 3 ways to pick the last person: 5 × 4 × 3 = 60 ways to arrange 3 people from a group of 5. If you follow this pattern, you can arrange 5 people from a group of 10, 10 × 9 × 8 × 7 × 6 = 30,240 ways.

This can be generalised: how many ways can you arrange x things from n things. Before I show the formula for this, I need to explain new notation.

Using a “!” after a number has a meaning in maths. This is called a factorial. As an example, 5! = 5 × 4 × 3 × 2 × 1 = 120. So a factorial is successively multiplying one less number. Factorials increase quickly. 40! is a number slightly greater than 8 followed by 47 zeroes. Factorials are used in maths formulas frequently and in order to make these consistent, 0! is defined as 1. Doesn’t look right but this must defined this way as we will see.

Looking at the examples above, we have partial factorials: instead of 5 × 4 ×3 × 2 × 1, we have 5 × 4 × 3, or instead of 10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 we have 10 × 9 × 8 × 7 × 6. Notice that 5! can be thought of as 5 × 3 × 4 × 2! and 10! can be 10 × 9 × 8 × 7 × 6 × 5!. In the first example, the “2” is 5 – 3, that is the number of people minus the number in the arrangement. In the second, the “5” is 10 – 5, that is the number of people minus the number in the arrangement. If we let n be the number of total things and x the number of the things to be arranged, then the formula to compute this in general is:

\[P\left(n,x\right)={^n}P_x=P_x^n=nPx=\frac{n!}{\left(n-x\right)!}\]

The formula is the far right expression, the notations on the left are the common notations used in different places that mean the same thing. So applying this to our two examples:

\[P(5,3)=\frac{5!}{\left(5-3\right)!}=\frac{5\times4\times3\times2!}{2!}=5\times4\times3=60\] \[P\left(10,5\right)=\frac{10!}{\left(10-5\right)!}=\frac{10\times9\times8\times7\times6\times5!}{5!}=10\times9\times8\times7\times6=30,240\]

The 2! and the 5! cancel out in the fractions and we get the result we want. If we wanted to arrange 5 things from a group of 5, we use the definition that 0! = 1:

\[P\left(5,5\right)=\frac{5!}{\left(5-5\right)!}=\frac{5!}{0!}=\frac{5!}{1}=5\times4\times3\times2\times1=120\]

Because of how large factorials grow, if calculating this formula by hand, it is better to first cancel the (nx)! part from the numerator, then calculate the result.

If you are fortunate to own a CAS calculator, using the permutation function nPr gets the same result with less work: nPr(10,5) = 30,240.

Now this does not directly answer questions about picks where order does not matter, like the number of poker hands. That is a combination question and I will talk about that in my next post.

Statistics – Probability of Conditional Events

This is about the probability of an event given some information. What follows assumes you know how to calculate basic probabilities (two posts ago) and the probability of an intersection of events (my last post).

Let’s start with an example. What is the probability of rolling a 6 on the roll of a die? From basic probability, we know that it is the number of ways to roll a 6 (only 1 way) divided by the number of total things that can happen (6). So the probability is 1/6. Now what if the die is rolled and a friend cheats by telling you that the number rolled is odd. Intuitively, you would say that the probability is 0 as 6 is an even number, so the additional information tells you that a 6 is not possible. The probability of a 6 and an odd number is 0 because the number of ways you can roll a 6 and an odd number is 0.

Now if the die is rolled and your friend says that the number rolled is even, what is the probability that a 6 was rolled? Intuitively, knowing that the number is even should increase the chances that a 6 was rolled. We can answer this using the basic probability formula: the number of ways to roll a 6 and an even number divided by the number of even numbers. Knowing that the number is even reduces the number of total things that can happen from 6 to 3. And the number of ways you can roll a 6 and an even number is 1. So the new probability, thanks to your friend, is 1/3.

As always, maths has notation for this. Let A and B be two events. Then the notation for “the probability of event A given (or on condition that) B occurred” is P(A|B). From the examples above, if A is the event of rolling a 6, and B is the event of rolling an odd number or it’s the event of rolling an even number, then the equation to calculate this is

\[P\left(A\middle|B\right)=\frac{n\left(A\cap B\right)}{n\left(B\right)}\]

From my last two posts, remember that n(something) means the number of ways that something can occur, and the symbol ∩ means intersection or “and”.

This equation can be shown to be equivalent to

\[P\left(A\middle|B\right)=\frac{P\left(A\cap B\right)}{P\left(B\right)}\]

where the probabilities are used instead of the numbers. This can be rearranged to give what is called the multiplication rule of probability

\[P\left(A\cap B\right)=P\left(A\middle|B\right)P\left(B\right)\]

So if P(AB) = 0.3 and P(B) = 0.7, then

\[P\left(A\middle|B\right)=\frac{P\left(A\cap B\right)}{P\left(B\right)}=\frac{0.3}{0.7}=\frac{3}{7}\]

Another example that shows conditional probability and the multiplication rule of probability in action is the following:

There is a bag with 10 marbles in it: 4 red and 6 blue ones. Two marbles are picked from the bag without replacing the first ball picked. “Without replacing” is important because the probability of picking the second ball’s color is affected by the first ball picked. If the first ball was replaced, the probability of the second ball’s color would not depend in the first ball’s color, that is, the two picks would be independent of each other.

So let’s look at some of the probabilities in this experiment. The probability that the first ball is red is P(R₁) =4/10 = 2/5. Now the probability of the second ball picked is dependent on that as there is 1 less red ball and 1 less ball in total. So the probability that the second ball is red is P(R₂|R₁) = 3/9 = 1/3 because there are only 3 red balls left out of the 9 balls left. Likewise, P(B₂|R₁) =6/9 = 2/3. For simple experiments like this, tree diagrams are often used to get a complete picture of all the possibilities:

The last column of combination probabilities uses the multiplication rule previously stated. Using tree diagrams like this, you can answer many questions about the experiment by adding these probabilities:

  1. What is the probability of picking just one red marble?
\[P\left({B_2\cap R}_1\right)+P\left({R_2\cap B}_1\right)=\frac{4}{15}+\frac{4}{15}=\frac{8}{15}\]

2. What is the probability of picking two marbles of the same color?

\[P\left({R_2\cap R}_1\right)+\ P\left({B_2\cap B}_1\right)=\frac{2}{15}+\frac{1}{3}=\frac{7}{15}\]

3. What is the probability of picking at least one red marble?

\[P\left({R_2\cap R}_1\right)+P\left({B_2\cap R}_1\right)+P\left({R_2\cap B}_1\right)=\frac{2}{15}+\frac{4}{15}+\frac{4}{15}=\frac{10}{15}=\frac{2}{3}\]

Note that since the last column includes all the possible ways this experiment can go, all of these probabilities add up to 1. So to answer question 3, a more efficient way to calculate the answer is to subtract the one possibility excluded from 1:

\[1-P\left({B_2\cap B}_1\right)=1-\frac{1}{3}=\frac{2}{3}\]

Counting the number of ways an event can happen in simple experiments like this is easy to do in your head. But what about questions like “how many poker hands (5 cards) can be made from a standard deck of 52 cards?”. Not so easy. So next time, I will talk about how we can “count” large possibilities like this.

Statistics – Probability of Combined Events

I ended my last post showing the probability of picking a type of card from a standard deck of 52 cards. For example, if the event of interest, A, is picking a Jack, then the probability of picking a Jack from a shuffled deck of cards is

\[P\left(A\right)=\frac{4}{52}=\frac{1}{13}\]

because there are 4 ways to pick a Jack out of 52 cards. Now let’s consider probabilities of events like “picking a Jack or a Heart” or “a face card and a Heart”.

If we let events A be picking a Jack, B be picking a Heart, and C be picking a face card (Jack, Queen, or King), then the maths notation for these statements are

\[P\left(A\cup B\right)=\mathrm{probability\ of\ picking\ a\ Jack\ or\ a\ Heart}\] \[P\left(B\cap C\right)=\mathrm{probability\ of\ picking\ a\ face\ card\ and\ a\ Heart}\]

The symbol “∪” stands for the union of two events, but in English, you can use the word “or”: AB = “A union B” or “A or B“. The symbol “∩” stands for the intersection of two events, but in English, you can use the word “and”: BC = “B intersection C” or “B and C“. These concepts are easily seen in a Venn diagram:

Circle A is the set of all Jacks and circle B is the set of all Hearts. Now the probability of picking a card from set A is 4/52. The probability of picking a card from set B is 13/52. You may be tempted so say that the probability of A or B is the sum of the two individual probabilities. But both of these probabilities include the Jack of Hearts so it is used twice. We have to subtract out this intersection of the two probabilities, so in maths notation:

\[P\left(A\cup B\right)=P\left(A\right)+P\left(B\right)-P\left(A\cap B\right)\]

This equation can be rearranged to show that the probability of the intersection of the two events is equal to the sum of the individual probabilities minus the probability of the union:

\[P\left(A\cap B\right)=P\left(A\right)+P\left(B\right)-P\left(A\cup B\right)\]

These two equations are different forms of what is called the addition rule of probability.

So P(AB) = 4/52 +13/52 – 1/52 = 16/52, because P(AB) is the probability of a Jack and a Heart. Only one card satisfies this, the Jack of Hearts, so the probability of that is 1/52.

Now let’s define event D as picking a Diamond and consider the probability of picking a Heart and a Diamond, P(BD). This is clearly 0 as a card cannot be both suits. The associated Venn diagram looks like:

Events like this are called mutually exclusive, that is, you can pick one or the other, the picked card cannot be both. For mutually exclusive events:

\[P\left(B\cup D\right)=P\left(B\right)+P\left(D\right)\ \mathrm{and}\ P\left(B\cap D\right) =0\]

In my next post, I will discuss what is called conditional probabilities and explore the probability of picking a Jack given that the card is a Heart.

Statistics – Basic Probability

As I am covering this topic now with many of my students, let’s start a series on statistics.

The first concept taught when introducing statistics to students is that of probability. Let’s start with the experiment of rolling a die. I italicise experiment because it is a formal term in statistics. I will italicise other terms in this post.

If we are interested in the outcome or the event of rolling a “3”, what is the probability of that occurring? As there are six possible outcomes, all equally likely, and “3” is just one of them, then the probability is 1/6 or 1 out of 6. As maths likes to use shorthand notation to represent concepts, lets notationise (my word) this.

Let A represent the event of rolling a “3”. The probability of this is represented by P(A). The probability of rolling a “3” is the number of ways a “3” can occur (one way) divided by the total number of things that can occur (six). So to generalise this, for experiments where all outcomes are equally likely, the probability of an event A is

\[P\left(A\right)=\frac{\mathrm{number\ of\ ways}\ A\ \mathrm{can\ occur}\ }{\mathrm{total\ number\ of\ things\ that\ can\ occur}}=\frac{n\left(A\right)}{n\left(\xi\right)}\]

Now I’ve introduced some new notation here. n(A) is notation that means “number of ways A can occur”. The Greek letter xi, ????, is the set of all the things that can happen. In this case, ???? ={1, 2, 3, 4, 5, 6}. This is also called the sample space of the experiment. So n(????) = 6. In our experiment and the event of interest (rolling a “3”), n(A) = 1 and n(????) = 6 so P(A) = 1/6.

So what is the probability of rolling an even number? Here, A = “rolling an even number”. As there are three even numbers, or three ways, that this can occur, then P(A) = 3/6 = 1/2.

Let me say a few general things about probability. The probability of an event is always a number between 0 and 1 including 0 and 1. At the extreme ends, if P(A) = 0, then event A has no chance of occurring. So in our experiment, if A is the event of rolling a “7”, then P(A) = 0. If P(A) = 1, then the event is a certainty to happen. If A is the event of rolling an odd or even number, then P(A) = 1.

Now let’s look at a slightly more complex experiment: picking a card from a standard deck of cards. A standard deck of 52 cards has four suits (hearts, clubs, diamonds, spades) of 13 cards each. Each suit consists of an Ace, Jack, Queen, King, and numbered cards 2 to 10. The Jack, Queen, and King cards are called face cards. So this experiment is choosing one card out of a shuffled deck of cards.

If A = choosing a Heart, then P(A) = 13/52 = 1/4 because there are 13 ways to pick a Heart out of 52 ways any card can be picked. Now let A = choosing a Jack. Here P(A) = 4/52 = 1/13 as there are 4 Jacks in a deck of cards. Now let A = picking a face card. Then P(A) = 12/52 =3/13 as there are 12 face cards in a deck.

In my next post, we’ll explore how to handle more complex events like choosing a Queen or a Heart.

Happy Birthday!

For today’s post, I thought I’d return to statistics. Remember the Monty Hall problem I talked about last year? If not, do a search on “Monty” on the Blog page. That was an example of statistics defying common sense. Today’s post is another one of those.

This post is about the probability of any two people in a group of people in a room having the same birthday. But let’s simplify this scenario with an equivalent one. Suppose we have a random number generator that generates a number between 1 and 365, including 1 and 365 – kind of like a 365 faced die. Let’s say we “roll” this die twice. What is the probability that the two numbers generated are the same, the successful event?

As is often the case in statistics, it is easier to look at the probability of the unsuccessful events. You can then subtract that from 1 to get the probability of the successful event since

Probability of Success + Probability of Failure = 1 or

Probability of Success = 1 – Probability of Failure

since one or the other must happen. Remember that a certainty in probability is “1”, absolutely no chance is “0” and other probabilities are between those two numbers. For example, the probability of flipping a heads is 0.5. Please see my posts on probability for a review if needed.

So if we roll this 365-faced die twice, the first roll sets the number and the chance of the second roll matching that number is 1/365 and the chance of not matching that number is 364/365. This probability is

\[
{1}\hspace{0.33em}{-}\hspace{0.33em}\frac{364}{365}\hspace{0.33em}{=}\hspace{0.33em}{0}{.}{00274}
\]

Very small! I wouldn’t bet on that happening. This is equivalent to the probability that two random people have the same birthday. Please bear with me here, but an equivalent expression that takes into account that we are rolling the die twice (or have two people in the room) is:

\[
{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{\frac{{2}\times{1}}{2}}{=}\hspace{0.33em}{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{1}\hspace{0.33em}{=}\hspace{0.33em}{0}{.}{00274}
\]

That expression in the exponent, (2×1)/2, is how to calculate the number of pairs that have a chance of being a success. Since we are just rolling the die twice (or there are just two people in a room), we only have 1 pair. If we roll the die 3 times, there are (3×2)/2 or 3 pairs of numbers to compare. Note that this exponent is generated by multiplying the number of rolls by one less, then dividing by 2. So for three rolls (3 people in a room), the chance of two numbers being the same are

\[
{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{\frac{{3}\times{2}}{2}}{=}\hspace{0.33em}{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{3}\hspace{0.33em}{=}\hspace{0.33em}{0}{.}{00}{82}
\]

Well that appeared to have increased the odds a bit. Let’s roll the die 10 times or have 10 people in a room. There are (10×9)/2 0r 45 pairs that have a chance of being the same. So the probability in this case is

\[
{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{\frac{{10}\times{9}}{2}}{=}\hspace{0.33em}{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{45}\hspace{0.33em}{=}\hspace{0.33em}{0}{.}{1161}
\]

That means that in a group of 10 people, you have slightly better than an 11% chance that any two people have the same birthday. That really increased the chances with just a few more people! You can keep doing this for any number of rolls (people) using the formula

\[
{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{\frac{{n}{(}{n}{-}{1}{)}}{2}}
\]

where n is the number of rolls or number of people in a room. If you let n = 23, you will find that the chance of any two people having the same birthday is

\[
{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{\frac{23(22)}{2}}\hspace{0.33em}{=}\hspace{0.33em}{1}\hspace{0.33em}{-}\hspace{0.33em}{\left({\frac{364}{365}}\right)}^{{253}\hspace{0.33em}}\hspace{0.33em}{=}\hspace{0.33em}{0}{.}{5005}
\]

You have better than a 50% chance that in a room of 23 people, two of them will have the same birthday! Mathematically, this is so because you have 253 pairs to compare, or 253 opportunities of a success. What a surprise!

Statistics and Data, Confidence Interval

Before I get to the core of today’s post, I would like to show the details of calculating the standard deviation. From my last post, we have two sets of data: one from Nice David and the other from Evil David. Let’s look a t Nice David’s data which are the last five test scores of his students: 83, 83, 85, 87, and 90.

Now in my last post, I gave you the formula for calculating the standard deviation:

\[{\mathrm{standard}}\hspace{0.33em}{\mathrm{deviation}}\hspace{0.33em}\hspace{0.33em}{=}\hspace{0.33em}\sigma{=}\hspace{0.33em}\sqrt{\frac{\sum{{(}{x}\hspace{0.33em}{-}\hspace{0.33em}\overline{x}{)}^{2}}}{n}}\]

This says (much more elegantly than english) that I need to subtract the mean of the data from each data point, square each difference, add these up, divide by the number of data points, then take the square root of the result.

The mean of this data is 85.6. Taking the first data point of 83, subtracting 85.6 gives -2.6. Squaring this gives 6.76. I do this to each data point. After squaring each difference, I add these up and I get 35.2. Dividing this by the number of data points (5), I get 7.04. Then finally taking the square root, I get the standard deviation of 2.65. Doing the same thing to Evil David’s data gives standard deviation of 12.09. By the way, if you know how to use Excel, these calculations are very easy to do when you have lots of data.

Now this post is about one of the ways you can use the standard deviation to make a decision as to which tutor you should use.

Now there is a lot of development that I will be leaving out here and I will also be making several assumptions to simplify the presentation, but the final result is still valid.

I will be assuming that the data from each tutor is normally distributed, which means we can make certain statements about the standard deviations. This is not an extreme assumption as this is usually assumed in statistics.

For data that is normally distributed, an interval of the mean minus one standard deviation to the mean plus one standard deviation contains 68% of the data. So for Nice David’s data, the mean minus one standard deviation is 85.6 – 2.65 = 82.95. The mean plus one standard deviation is 85.6 + 2.65 = 88.25. Now your test score will be another data point in Nice David’s data. Though you do not know what your test score will be, based on Nice David’s historical data, you can be 68% confident that your test score will be between 82.95 and 88.25.

The interval of all numbers between 82.95 and 88.25 is called a confidence interval. In this case, it is a 68% confidence interval. Let’s calculate Evil David’s 68% confidence interval.

Evil David’s standard deviation is 12.09, so his interval is 85.6 – 12.09 = 73.51 to 85.6 + 12.09 = 97.69. So with Evil David, you can be 68% confident that your test score will be between 73.51 am 97.69.

Now most calculations in statistics center around the 95% confidence interval. For normally distributed data that we are assuming here, that is an interval that is two standard deviations about the mean. So for Nice David, the 95% interval is 85.6 – 2×2.65 = 80.3 to 85.6 +2×2.65 = 90.9. So for Nice David as your tutor, you can be 95% confident that your test score will be between 80.3 to 90.9.

What about Evil David? His 95% interval is 85.6 – 2×12.09 = 61.42 to 85.6 + 2*12.09 = 109.78. Now it’s not possible to get over 100 so you can be 95% confident that with Evil David as your tutor, your test score will be between 61.42 and 100.

Who do you choose? If 65 is a passing score on your test, you would be risking a failing grade with Evil David. Not so much with nice David. Though you do have a chance of getting a very high score with Evil David (if you like lotteries), your test score is 95% guaranteed to be a passing one with Nice David. Being the unbiased person that I am, I would go with the tutor with more consistent results!

Statistics and Data, Dispersion

Before I begin the topic of dispersion, I want to illustrate the power of the maths language. Unlike most words in english, maths words (notation) build upon other maths words, making the maths language very efficient at talking about maths. For example, in my last post, the mean was defined as

\[\overline{x}\hspace{0.33em}{=}\hspace{0.33em}\frac{\sum{x}}{n}\]

This is so much more elegant and succinct than ” the mean of a set of data is the sum of all the data points divided by the number of data points”. The maths definition is much shorter because the symbols build upon prior things you have learned, specifically what Σ means and the concepts of addition and division. This power of maths notation allows us to conceptualise and design very complex things like the search algorithms used by Google and the sending of  spacecrafts to Mars. Now before I get more excited, let’s go on to today’s topic.

Suppose you have to choose between two maths tutors: myself (David the Maths Tutor) or my competitor, Evil David the Maths Tutor. They both publish the last 5 test scores  from their students. My published scores are 83, 83, 85, 87, and 90. Evil David’s published score are 71, 73, 86, 99, and 99. Which one do you choose?

If you’ve been paying attention, you would think that maybe you should find the mean of both sets of data. If you do, you will find that they both have the same mean of 85.6. You may be attracted to Evil David because of the high scores, but then notice that there are lower scores as well. You also notice that Nice David (me), has more consistent results. Wouldn’t it be nice if we could measure this quality of data, that is some measure of how spread out the data is. Well there is, otherwise I wouldn’t be writing this post.

Just like the mean, there are several measures for spread or dispersion of the data. The range is the difference between the largest and the smallest data point. This is not used too much as it is only affected by two of the data points, the spread of the data between these two points do not affect the measurement. What about the average difference between each data point and the data’s mean? The problem with that is that some of the differences would be negative and some positive, which when added together would give a smaller number than desired. But this is the right idea. If you squared these differences and took the average of the sum of these differences squared, you would have all positive numbers. This measurement is called the variance and the formal formula for this is

\[{\mathrm{Variance}}\hspace{0.33em}\hspace{0.33em}{=}\hspace{0.33em}{\sigma}^{2}{=}\hspace{0.33em}\frac{\sum{{(}{x}\hspace{0.33em}{-}\hspace{0.33em}\overline{x}{)}^{2}}}{n}\]

So you first calculate the mean of the data, then subtract that mean from each of the data points and square that difference, sum all these squared values together, then divide by the number of data points. Again, saying this in the language of math is so much more elegant. Now Nice David’s data variance is 7.04 and Evil David’s variance is 146.24.

Now the problem with the variance is that if the data has units like meters, the variance has units meters squared since we are squaring the differences. It would be nice to have a dispersion measurement with the same units as the data. If you noticed, the symbol for variance is σ². σ is the lower case version of Σ so it is also called “sigma”. So you may hear the variance called “sigma-squared”. If we take the square root of the variance, the units will now be the same as the data and this is in fact done. The square root of the variance is called the standard deviation. So the formula for the standard deviation is the same as the variance except that you take the square root as the final step:

\[{\mathrm{standard}}\hspace{0.33em}{\mathrm{deviation}}\hspace{0.33em}\hspace{0.33em}{=}\hspace{0.33em}\sigma{=}\hspace{0.33em}\sqrt{\frac{\sum{{(}{x}\hspace{0.33em}{-}\hspace{0.33em}\overline{x}{)}^{2}}}{n}}\]

The standard deviation of Nice David’s data is 2.65. Evil David’s standard deviation is 12.09. In my next post, I will show you how to use these numbers to make an informed decision. (Hint: It doesn’t look good for Evil David).

Statistics and Data, A Mean Post

I thought I’d start a series on data and how data is consolidated in statistics. My previous posts on statistics were about probability. Probability will enter into the discussion later.

So having a set of data to analyse is what the bulk of a statistics course is about. One of the first things taught are Measures of Central Tendency. These are ways of consolidating all the data into one number. You are familiar with one of these – average or mean. The term “mean” is the mathematician’s term for “average”. As you know, the average is the sum of all the given data divided by the number of data points you have. So for example, the average of 4, 5, and 6 is (4 + 5 + 6)/3 = 15/3 = 5. Now there are other measures of central tendency as well: median and mode. I won’t cover these as they are not as important as the mean in most statistical operations.

I would like to introduce some notation. The formulas used in statistics usually involve summing things. So to indicate a sum, the Greek letter sigma, Σ,  is used. Sigma is the Greek version of the English “S” which is appropriate as it is the first letter in “Sum”. Also, the letter n is typically used to indicate the number of data points and xᵢ is used to represent the data. The letter i is called a subscript. The subscript i represents the generic data point. You can replace the i with a number to represent a particular data point. In our case, x₁ = 4, x₂ = 5, and x₃ = 6. So the formula would look like:

where the numerator notation means “add up all the xᵢ’s, changing the i from 1 to n“. Notice the notation for the mean, a bar over the x. This is pronounced “x bar”. This notation for the mean will be used frequently in my next posts.

It is usually understood from the context that we are summing over all the data, so you may just see Σx in the numerator without the i or n.

So 4, 5, and 6 have a mean of 5, but so do 1, 5 and 9. The second set of data is spread out more and it would be nice to have a measure of this as well to use with the mean. That will be the subject of my next post.