Statistics and Data, A Mean Post

I thought I’d start a series on data and how data is consolidated in statistics. My previous posts on statistics were about probability. Probability will enter into the discussion later.

So having a set of data to analyse is what the bulk of a statistics course is about. One of the first things taught are Measures of Central Tendency. These are ways of consolidating all the data into one number. You are familiar with one of these – average or mean. The term “mean” is the mathematician’s term for “average”. As you know, the average is the sum of all the given data divided by the number of data points you have. So for example, the average of 4, 5, and 6 is (4 + 5 + 6)/3 = 15/3 = 5. Now there are other measures of central tendency as well: median and mode. I won’t cover these as they are not as important as the mean in most statistical operations.

I would like to introduce some notation. The formulas used in statistics usually involve summing things. So to indicate a sum, the Greek letter sigma, Σ,  is used. Sigma is the Greek version of the English “S” which is appropriate as it is the first letter in “Sum”. Also, the letter n is typically used to indicate the number of data points and xᵢ is used to represent the data. The letter i is called a subscript. The subscript i represents the generic data point. You can replace the i with a number to represent a particular data point. In our case, x₁ = 4, x₂ = 5, and x₃ = 6. So the formula would look like:

where the numerator notation means “add up all the xᵢ’s, changing the i from 1 to n“. Notice the notation for the mean, a bar over the x. This is pronounced “x bar”. This notation for the mean will be used frequently in my next posts.

It is usually understood from the context that we are summing over all the data, so you may just see Σx in the numerator without the i or n.

So 4, 5, and 6 have a mean of 5, but so do 1, 5 and 9. The second set of data is spread out more and it would be nice to have a measure of this as well to use with the mean. That will be the subject of my next post.