Statistics and Data, Dispersion

Before I begin the topic of dispersion, I want to illustrate the power of the maths language. Unlike most words in english, maths words (notation) build upon other maths words, making the maths language very efficient at talking about maths. For example, in my last post, the mean was defined as


This is so much more elegant and succinct than ” the mean of a set of data is the sum of all the data points divided by the number of data points”. The maths definition is much shorter because the symbols build upon prior things you have learned, specifically what Σ means and the concepts of addition and division. This power of maths notation allows us to conceptualise and design very complex things like the search algorithms used by Google and the sending of  spacecrafts to Mars. Now before I get more excited, let’s go on to today’s topic.

Suppose you have to choose between two maths tutors: myself (David the Maths Tutor) or my competitor, Evil David the Maths Tutor. They both publish the last 5 test scores  from their students. My published scores are 83, 83, 85, 87, and 90. Evil David’s published score are 71, 73, 86, 99, and 99. Which one do you choose?

If you’ve been paying attention, you would think that maybe you should find the mean of both sets of data. If you do, you will find that they both have the same mean of 85.6. You may be attracted to Evil David because of the high scores, but then notice that there are lower scores as well. You also notice that Nice David (me), has more consistent results. Wouldn’t it be nice if we could measure this quality of data, that is some measure of how spread out the data is. Well there is, otherwise I wouldn’t be writing this post.

Just like the mean, there are several measures for spread or dispersion of the data. The range is the difference between the largest and the smallest data point. This is not used too much as it is only affected by two of the data points, the spread of the data between these two points do not affect the measurement. What about the average difference between each data point and the data’s mean? The problem with that is that some of the differences would be negative and some positive, which when added together would give a smaller number than desired. But this is the right idea. If you squared these differences and took the average of the sum of these differences squared, you would have all positive numbers. This measurement is called the variance and the formal formula for this is


So you first calculate the mean of the data, then subtract that mean from each of the data points and square that difference, sum all these squared values together, then divide by the number of data points. Again, saying this in the language of math is so much more elegant. Now Nice David’s data variance is 7.04 and Evil David’s variance is 146.24.

Now the problem with the variance is that if the data has units like meters, the variance has units meters squared since we are squaring the differences. It would be nice to have a dispersion measurement with the same units as the data. If you noticed, the symbol for variance is σ². σ is the lower case version of Σ so it is also called “sigma”. So you may hear the variance called “sigma-squared”. If we take the square root of the variance, the units will now be the same as the data and this is in fact done. The square root of the variance is called the standard deviation. So the formula for the standard deviation is the same as the variance except that you take the square root as the final step:


The standard deviation of Nice David’s data is 2.65. Evil David’s standard deviation is 12.09. In my next post, I will show you how to use these numbers to make an informed decision. (Hint: It doesn’t look good for Evil David).