Resolved: Why is the formula for sample variation has n-1 in denominator instead of n?
I got the concept of variation and population variance, but why the sample variation has a n-1 in denominator? What's the purpose of this?
I have come across a very sensible answer to this in a book. (Don't recall the book, but the explanation made so much sense that it stayed with me.)
Imagine you have a huge bookshelf. You measure the total thickness of the first 6 books and it turns out to be 158mm. This means that the mean thickness of a book based on first 6 samples is 26.3mm.
Now you take out and measure the first book's thickness (one degree of freedom) and find that it is 22mm. This means that the remaining 5 books must have a total thickness of 136mm
Now you measure the second book (second degree of freedom) and find it to be 28mm. So you know that the remaining 4 books should have a total thickness of 108mm .
In this way, by the time you measure the thickness of the 5th book individually (5th degree of freedom) , you automatically know the thickness of the remaining 1 book.
This means that you automatically know the thickness of 6th book even though you have measured only 5. Extrapolating this concept, In a sample of size n, you know the value of the n'th observation even though you have only taken (n-1) measurements. i.e, the opportunity to vary has been taken away for the n'th observation.
This means that if you have measured (n-1) objects then the nth object has no freedom to vary. Therefore, degree of freedom is only (n-1) and not n.
NOTE- Suppose your mother told you that she had calculated the mean thickness of all the books in the bookshelf and found it to be 25.8[Let's call this mean(mom) and your original mean of 26.3 mean(you)].
If you use this measurement and perform the same experiment as above, then even the 6th observation can vary because it is not necessary that
mean(mom)*n = total thickness of n books.
Whereas since you calculated mean(you) from the n samples, it automatically follows that mean(you)*n = total thickness of n books.