May 17, 2019

Practice

Today we check homework

We will go one by one

Do not focus on the answer

Focus on how to get to the answer

Ask all your questions today

Summary of last part

There is population and samples

Do not confuse them

Identify them

Populations

Big. Sometimes theoretical

They have mean, variance and standard deviation

mean(population)
var(population)
sd(population)

Variance and Standard Deviation

Variance is the square of standard deviation

var(population) == sd(population)^2

Standard deviation is the square root of variance

sd(population) == sqrt(var(population))

What variance tells us about outcomes

An outcome is a random element of the population

Any outcome will be similar to the population mean

How similar? It depends on the population variance

Outcome versus population mean

The distance between outcome and population mean depends on the confidence level

outcome will probably be between mean(pop)±k*sd(pop)

\[\langle X\rangle - k\,\mathbb{S}(X)\leq\text{outcome}\leq\langle X\rangle + k\,\mathbb{S}(X)\]

The key idea is: different k have different probability

If the population has Normal distribution

(This is not always true)

When the population is Normal, then

k Probability
2 ≈ 95%
3 ≈ 99%
qnorm(1-alpha/2) 1-alpha

Choose your own alpha

If the population other distribution

(Chebyshev always works)

k Probability
2 1-1/22 = 75%
3 1-1/32 = 88.9%
10 1-1/102 = 99.9%
k 1-1/k2

Sample

A sample is a group of outcomes

It has size, mean, variance, and standard deviation

length(sample)
mean(sample)
var(sample)
sd(sample)

Sample Mean

Each sample is different (random)

Each sample mean is random

If the sample size is large, then sample mean has Normal distribution

Sample Mean Normal parameters

If sample mean has Normal distribution, we need to know it parameters

The average of sample mean is the population mean

The standard deviation of sample mean is population standard deviation divided by the square root of sample length

The variance of sample mean is population variance divided by sample length

Predicting Sample Mean

sample mean will probably be between mean(pop)±k*sd(pop)

\[\langle X\rangle - k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\leq\text{sample mean}\leq\langle X\rangle + k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\]

The key idea is: different k have different probability

Sample mean has Normal distribution

k Probability
2 ≈ 95%
3 ≈ 99%
qnorm(1-alpha/2) 1-alpha

Choose your own alpha

Inverse problem: from sample to population

In real life we do not know population mean, and we want to know.

We only know sample mean, and sample variance

We can approximate population variance by sample variance

But we have to pay a cost

Predicting Population Mean

population mean will probably be between mean(sample)±k*sd(pop)

\[\text{sample mean} - k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\leq \langle X\rangle \leq\text{sample mean} + k\frac{\mathbb{S}(X)}{\sqrt{\text{length}}}\]

The key idea is: different k have different probability

The cost of ignoring population variance

Now we have Student’s distribution

This depends on degrees of freedom (sample length-1)

k Probability
qt(1-alpha/2, df) 1-alpha

Choose your own alpha