May 17, 2019
We will go one by one
Do not focus on the answer
Focus on how to get to the answer
There is population and samples
Do not confuse them
Identify them
Big. Sometimes theoretical
They have mean, variance and standard deviation
mean(population) var(population) sd(population)
Variance is the square of standard deviation
var(population) == sd(population)^2
Standard deviation is the square root of variance
sd(population) == sqrt(var(population))
An outcome is a random element of the population
Any outcome will be similar to the population mean
How similar? It depends on the population variance
The distance between outcome and population mean depends on the confidence level
outcome will probably be between mean(pop)±k*sd(pop)
\[\langle X\rangle - k\,\mathbb{S}(X)\leq\text{outcome}\leq\langle X\rangle + k\,\mathbb{S}(X)\]
The key idea is: different k
have different probability
(This is not always true)
When the population is Normal, then
k | Probability |
---|---|
2 | ≈ 95% |
3 | ≈ 99% |
qnorm(1-alpha/2) |
1-alpha |
Choose your own alpha
(Chebyshev always works)
k | Probability |
---|---|
2 | 1-1/22 = 75% |
3 | 1-1/32 = 88.9% |
10 | 1-1/102 = 99.9% |
k | 1-1/k2 |
A sample is a group of outcomes
It has size, mean, variance, and standard deviation
length(sample) mean(sample) var(sample) sd(sample)
Each sample is different (random)
Each sample mean is random
If the sample size is large, then sample mean has Normal distribution
If sample mean has Normal distribution, we need to know it parameters
The average of sample mean is the population mean
The standard deviation of sample mean is population standard deviation divided by the square root of sample length
The variance of sample mean is population variance divided by sample length
sample mean will probably be between mean(pop)±k*sd(pop)
\[\langle X\rangle - k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\leq\text{sample mean}\leq\langle X\rangle + k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\]
The key idea is: different k
have different probability
k | Probability |
---|---|
2 | ≈ 95% |
3 | ≈ 99% |
qnorm(1-alpha/2) |
1-alpha |
Choose your own alpha
In real life we do not know population mean, and we want to know.
We only know sample mean, and sample variance
We can approximate population variance by sample variance
But we have to pay a cost
population mean will probably be between mean(sample)±k*sd(pop)
\[\text{sample mean} - k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\leq \langle X\rangle \leq\text{sample mean} + k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\]
The key idea is: different k
have different probability
Now we have Student’s distribution
This depends on degrees of freedom (sample length-1)
k | Probability |
---|---|
qt(1-alpha/2, df) |
1-alpha |
Choose your own alpha