Statistics and Data, Confidence Interval

Before I get to the core of today’s post, I would like to show the details of calculating the standard deviation. From my last post, we have two sets of data: one from Nice David and the other from Evil David. Let’s look a t Nice David’s data which are the last five test scores of his students: 83, 83, 85, 87, and 90.

Now in my last post, I gave you the formula for calculating the standard deviation:


This says (much more elegantly than english) that I need to subtract the mean of the data from each data point, square each difference, add these up, divide by the number of data points, then take the square root of the result.

The mean of this data is 85.6. Taking the first data point of 83, subtracting 85.6 gives -2.6. Squaring this gives 6.76. I do this to each data point. After squaring each difference, I add these up and I get 35.2. Dividing this by the number of data points (5), I get 7.04. Then finally taking the square root, I get the standard deviation of 2.65. Doing the same thing to Evil David’s data gives standard deviation of 12.09. By the way, if you know how to use Excel, these calculations are very easy to do when you have lots of data.

Now this post is about one of the ways you can use the standard deviation to make a decision as to which tutor you should use.

Now there is a lot of development that I will be leaving out here and I will also be making several assumptions to simplify the presentation, but the final result is still valid.

I will be assuming that the data from each tutor is normally distributed, which means we can make certain statements about the standard deviations. This is not an extreme assumption as this is usually assumed in statistics.

For data that is normally distributed, an interval of the mean minus one standard deviation to the mean plus one standard deviation contains 68% of the data. So for Nice David’s data, the mean minus one standard deviation is 85.6 – 2.65 = 82.95. The mean plus one standard deviation is 85.6 + 2.65 = 88.25. Now your test score will be another data point in Nice David’s data. Though you do not know what your test score will be, based on Nice David’s historical data, you can be 68% confident that your test score will be between 82.95 and 88.25.

The interval of all numbers between 82.95 and 88.25 is called a confidence interval. In this case, it is a 68% confidence interval. Let’s calculate Evil David’s 68% confidence interval.

Evil David’s standard deviation is 12.09, so his interval is 85.6 – 12.09 = 73.51 to 85.6 + 12.09 = 97.69. So with Evil David, you can be 68% confident that your test score will be between 73.51 am 97.69.

Now most calculations in statistics center around the 95% confidence interval. For normally distributed data that we are assuming here, that is an interval that is two standard deviations about the mean. So for Nice David, the 95% interval is 85.6 – 2×2.65 = 80.3 to 85.6 +2×2.65 = 90.9. So for Nice David as your tutor, you can be 95% confident that your test score will be between 80.3 to 90.9.

What about Evil David? His 95% interval is 85.6 – 2×12.09 = 61.42 to 85.6 + 2*12.09 = 109.78. Now it’s not possible to get over 100 so you can be 95% confident that with Evil David as your tutor, your test score will be between 61.42 and 100.

Who do you choose? If 65 is a passing score on your test, you would be risking a failing grade with Evil David. Not so much with nice David. Though you do have a chance of getting a very high score with Evil David (if you like lotteries), your test score is 95% guaranteed to be a passing one with Nice David. Being the unbiased person that I am, I would go with the tutor with more consistent results!

Statistics and Data, Dispersion

Before I begin the topic of dispersion, I want to illustrate the power of the maths language. Unlike most words in english, maths words (notation) build upon other maths words, making the maths language very efficient at talking about maths. For example, in my last post, the mean was defined as


This is so much more elegant and succinct than ” the mean of a set of data is the sum of all the data points divided by the number of data points”. The maths definition is much shorter because the symbols build upon prior things you have learned, specifically what Σ means and the concepts of addition and division. This power of maths notation allows us to conceptualise and design very complex things like the search algorithms used by Google and the sending of  spacecrafts to Mars. Now before I get more excited, let’s go on to today’s topic.

Suppose you have to choose between two maths tutors: myself (David the Maths Tutor) or my competitor, Evil David the Maths Tutor. They both publish the last 5 test scores  from their students. My published scores are 83, 83, 85, 87, and 90. Evil David’s published score are 71, 73, 86, 99, and 99. Which one do you choose?

If you’ve been paying attention, you would think that maybe you should find the mean of both sets of data. If you do, you will find that they both have the same mean of 85.6. You may be attracted to Evil David because of the high scores, but then notice that there are lower scores as well. You also notice that Nice David (me), has more consistent results. Wouldn’t it be nice if we could measure this quality of data, that is some measure of how spread out the data is. Well there is, otherwise I wouldn’t be writing this post.

Just like the mean, there are several measures for spread or dispersion of the data. The range is the difference between the largest and the smallest data point. This is not used too much as it is only affected by two of the data points, the spread of the data between these two points do not affect the measurement. What about the average difference between each data point and the data’s mean? The problem with that is that some of the differences would be negative and some positive, which when added together would give a smaller number than desired. But this is the right idea. If you squared these differences and took the average of the sum of these differences squared, you would have all positive numbers. This measurement is called the variance and the formal formula for this is


So you first calculate the mean of the data, then subtract that mean from each of the data points and square that difference, sum all these squared values together, then divide by the number of data points. Again, saying this in the language of math is so much more elegant. Now Nice David’s data variance is 7.04 and Evil David’s variance is 146.24.

Now the problem with the variance is that if the data has units like meters, the variance has units meters squared since we are squaring the differences. It would be nice to have a dispersion measurement with the same units as the data. If you noticed, the symbol for variance is σ². σ is the lower case version of Σ so it is also called “sigma”. So you may hear the variance called “sigma-squared”. If we take the square root of the variance, the units will now be the same as the data and this is in fact done. The square root of the variance is called the standard deviation. So the formula for the standard deviation is the same as the variance except that you take the square root as the final step:


The standard deviation of Nice David’s data is 2.65. Evil David’s standard deviation is 12.09. In my next post, I will show you how to use these numbers to make an informed decision. (Hint: It doesn’t look good for Evil David).