Statistics and Data, Confidence Interval

Before I get to the core of today’s post, I would like to show the details of calculating the standard deviation. From my last post, we have two sets of data: one from Nice David and the other from Evil David. Let’s look a t Nice David’s data which are the last five test scores of his students: 83, 83, 85, 87, and 90.

Now in my last post, I gave you the formula for calculating the standard deviation:

${\mathrm{standard}}\hspace{0.33em}{\mathrm{deviation}}\hspace{0.33em}\hspace{0.33em}{=}\hspace{0.33em}\sigma{=}\hspace{0.33em}\sqrt{\frac{\sum{{(}{x}\hspace{0.33em}{-}\hspace{0.33em}\overline{x}{)}^{2}}}{n}}$

This says (much more elegantly than english) that I need to subtract the mean of the data from each data point, square each difference, add these up, divide by the number of data points, then take the square root of the result.

The mean of this data is 85.6. Taking the first data point of 83, subtracting 85.6 gives -2.6. Squaring this gives 6.76. I do this to each data point. After squaring each difference, I add these up and I get 35.2. Dividing this by the number of data points (5), I get 7.04. Then finally taking the square root, I get the standard deviation of 2.65. Doing the same thing to Evil David’s data gives standard deviation of 12.09. By the way, if you know how to use Excel, these calculations are very easy to do when you have lots of data.

Now this post is about one of the ways you can use the standard deviation to make a decision as to which tutor you should use.

Now there is a lot of development that I will be leaving out here and I will also be making several assumptions to simplify the presentation, but the final result is still valid.

I will be assuming that the data from each tutor is normally distributed, which means we can make certain statements about the standard deviations. This is not an extreme assumption as this is usually assumed in statistics.

For data that is normally distributed, an interval of the mean minus one standard deviation to the mean plus one standard deviation contains 68% of the data. So for Nice David’s data, the mean minus one standard deviation is 85.6 – 2.65 = 82.95. The mean plus one standard deviation is 85.6 + 2.65 = 88.25. Now your test score will be another data point in Nice David’s data. Though you do not know what your test score will be, based on Nice David’s historical data, you can be 68% confident that your test score will be between 82.95 and 88.25.

The interval of all numbers between 82.95 and 88.25 is called a confidence interval. In this case, it is a 68% confidence interval. Let’s calculate Evil David’s 68% confidence interval.

Evil David’s standard deviation is 12.09, so his interval is 85.6 – 12.09 = 73.51 to 85.6 + 12.09 = 97.69. So with Evil David, you can be 68% confident that your test score will be between 73.51 am 97.69.

Now most calculations in statistics center around the 95% confidence interval. For normally distributed data that we are assuming here, that is an interval that is two standard deviations about the mean. So for Nice David, the 95% interval is 85.6 – 2×2.65 = 80.3 to 85.6 +2×2.65 = 90.9. So for Nice David as your tutor, you can be 95% confident that your test score will be between 80.3 to 90.9.

What about Evil David? His 95% interval is 85.6 – 2×12.09 = 61.42 to 85.6 + 2*12.09 = 109.78. Now it’s not possible to get over 100 so you can be 95% confident that with Evil David as your tutor, your test score will be between 61.42 and 100.

Who do you choose? If 65 is a passing score on your test, you would be risking a failing grade with Evil David. Not so much with nice David. Though you do have a chance of getting a very high score with Evil David (if you like lotteries), your test score is 95% guaranteed to be a passing one with Nice David. Being the unbiased person that I am, I would go with the tutor with more consistent results!