Variance \(๐X\) and mean \(๐ผX\) of the population are often unknown
Usually we only have a small sample \(๐ = (X_1,โฆ,X_n)\)
Assuming that all \(X_i\) are taken from the same population and are mutually independent, what can we say about the sample mean and variance?
The variance of a set of numbers is easy to calculate
\[\begin{aligned}\text{var}(X_1,โฆ,X_n)& =\frac{1}{n}\sum_i (X_i-\bar{๐})^2\\ &=\frac{1}{n}\sum_i X_i^2-\bar{๐}^2\end{aligned}\]
(the average of squares minus the square of averages)
Since the sample is random, this is a random variable. What is its expected value?
Since \(๐ผ(ฮฑ X+ฮฒY)=ฮฑ๐ผ(X)+ฮฒ๐ผ(Y),\) we have
\[\begin{aligned} ๐ผ\text{var}(X_1,โฆ,X_n)&=๐ผ\left(\frac{1}{n}\sum_i X_i^2-\left(\frac{1}{n}\sum_i X_i\right)^2\right)\\ &=\frac{1}{n}\sum_i ๐ผ\left(X_i^2\right)-\frac{1}{n^2}๐ผ \left(\left(\sum_i X_i\right)^2\right)\end{aligned}\]
Now, since the sample is i.i.d. we have \(๐ผ \left(X_i^2\right)=๐ผ \left(X^2\right)\) and \[\sum_i๐ผ \left(X_i^2\right)=n๐ผ \left(X^2\right)\] therefore \[๐ผ\text{var}(X_1,โฆ,X_n)=\frac{1}{n}n ๐ผ\left(X^2\right)-\frac{1}{n^2}๐ผ \left(\sum_i X_i\right)^2\]
We can simplify the second part as \[\left(\sum_i X_i\right)^2=\left(\sum_i X_i\right)\left(\sum_j X_j\right)=\sum_i \sum_j X_i X_j\] therefore \[๐ผ \left(\sum_i X_i\right)^2=\sum_i \sum_j ๐ผ X_i X_j\] Here we have two cases.
If \(i=j,\) we have \[๐ผ X_i X_j = ๐ผ (X_i^2)=๐ผ (X^2)\] If \(iโ j,\) and since all outcomes are independent, we have \[๐ผ X_i X_j = ๐ผ(X_i)๐ผ(X_j)=(๐ผX)^2\] therefore \[๐ผ\left(\sum_i X_i\right)^2= n ๐ผ (X^2) + n(n-1)(๐ผ X)^2\]
\[\begin{aligned} ๐ผ\text{var}(X_1,โฆ,X_n)&=\frac{1}{n}n๐ผ X^2-\frac{1}{n^2}(n ๐ผ (X^2) + n(n-1)(๐ผ X)^2)\\ & =\frac{1}{n}\left((n-1)๐ผ X^2-(n-1)(๐ผ X)^2\right)\\ & =\frac{n-1}{n}(๐ผ X^2-(๐ผ X)^2)\\ & =\frac{n-1}{n}๐X \end{aligned}\]
we have found that \[๐ผ\text{var}(X_1,โฆ,X_n)= \frac{n-1}{n}๐X\] which is not exactly what we are looking for
If we want to estimate the mean \(๐ผX\) of a population we can use the sample mean \(\bar{X}\)
But if we want to estimate the variance \(๐X\) of a population we cannot use the sample variance \(\text{var}(X_1,โฆ,X_n)\)
Instead we have to use a different formula \[\hat{๐}(X) = \frac{1}{n-1}\sum_i(X_i-\bar{๐})^2\]
People uses two formulas, depending on the case ๐ฑ + If you only care about the sample, its variance is \[\text{var}(๐ฑ) =\frac{1}{n}\sum_i (x_i-\bar{๐ฑ})^2=\frac{1}{n}\sum_i x_i^2-\bar{๐ฑ}^2\]
When experiments produce numbers we can calculate average and variance
The population has a fixed mean and variance, even if we do not know their values
If we have an i.i.d sample we can estimate the population mean with the sample mean
If the sample is not i.i.d., its mean may not correspond to the population mean
The sample mean is probably close to the population mean, independent of the probability distribution
If the sample is 4 times bigger, the sample mean is 2 times closer to the population mean
The sample variance is not a good estimation of the population variance.