If you do not have access to the age distribution, but you know only the standard deviation
We are looking for population mean \(πΌX\)
We know the population variance \(πX\)
We interview \(n\) people and calculate \(\bar{π}\)
The population average is probably in the interval \[\left[\bar{π}-c\sqrt{π(X)/n}, \bar{π}+c\sqrt{π(X)/n}\right]\]
Using Chebyshevβs inequality, we know that the probability is at least \(1-1/c^2\)
We want the interval width to be less than 5 (or 1, or 1/12) years
Letβs say \[yβ€2c\sqrt{π(X)/n}\] therefore \[nβ₯4c^2π(X)/y^2\]
Variance π(X) is 473.23 yr2
c | Prob | 5 yr | 1 yr | 1 month |
---|---|---|---|---|
2 | 75% | 1,515 | 7,572 | 90,860 |
3 | 89% | 3,408 | 17,037 | 204,435 |
5 | 96% | 9,465 | 47,323 | 567,874 |
10 | 99% | 37,859 | 189,292 | 2,271,496 |
Homework Question 1
A company wants to offer insurance to protect against the economic damage of COVID-19.
We have \(n\) people paying, and \(π\) people getting sick. The result is \[R=nx-πy\] For our analysis, \(n, x\) and \(y\) are fixed, but \(π\) is a random variable. Thus \(R\) is a random variable.
We want to know \(πΌ(R)\)
How can we calculate it?
Using the definition, we have \[πΌ(R)=πΌ(nx-sy)=nx-πΌ(π)y\] So we need to calculate \(πΌ(π)\)
What do we know about \(π\)?
There are \(n\) people, each one can get sick with probability \(p\)
Each person is a βcoinβ with probability \(p\)
Thus \(π\) is a sum of coins
Assuming that each person gets sick independently, then \[π \sim Binom(n,p)\] Therefore, we immediately know that \[πΌ(π)=np\qquad π(π)=np(1-p)\]
After one year, the result \(R\) will be somewhere \[\left[πΌ(R)-c\sqrt{π(R)}, πΌ(R)+c\sqrt{π(R)}\right]\] That is \[\left[nx-npy-cy\sqrt{np(1-p)}, nx-npy+cy\sqrt{np(1-p)}\right]\]
How do we choose \(x\) and \(y\)?
We want \(Rβ₯0,\) so the lower limit of the interval must be positive \[nx-npy-cy\sqrt{npq}β₯0\] thus \[\frac{x}{y}β₯p+c\sqrt{\frac{p(1-p)}{n}}\]
Assuming \(p=0.1,\) then \(x/y\) must be at least
c | Prob | 10 | 100 | 1000 | 10000 | 100000 |
---|---|---|---|---|---|---|
2 | 75% | 0.29 | 0.16 | 0.12 | 0.11 | 0.10 |
3 | 89% | 0.38 | 0.19 | 0.13 | 0.11 | 0.10 |
5 | 96% | 0.57 | 0.25 | 0.15 | 0.12 | 0.10 |
10 | 99% | 1.05 | 0.40 | 0.19 | 0.13 | 0.11 |
We used Chebyshev formula, which does not need any hypothesis
But we have more information. We know that \(π\) is a Binomial random variable
Therefore we can make better confidence intervals
We know that \[β(π=k|n\text{ in total})=\binom{n}{k} p^k(1-p)^{n-k}\] We can calculate \(\binom{n}{k}\) using Pascalβs triangle, even in Excel
Pascalβs Triangle
\[β(πβ€k)=\sum_{j=0}^k β(π=j)\]
Good tools include functions to calculate the usual distributions
In Excel we have BINOM.DIST(k, n, p, cumulative)
In R we have pbinom()
and dbinom()
Now we have a coin π with two possible outcomes: +1 and -1
To make life easy, we assume π=0.5
What are the expected value and variance of X ?
We throw the coin π times, and we calculate π, the sum of all π \[Y=\sum_{i=1}^π X_i\]
What are the expected value and variance of π ?
Now consider \(Z_n=Y/\sqrt{π}\)
It is easy to see that \(πΌZ_n = 0\) and \(πZ_n = 1\) independent of π
The possible values of \(Z_n\) are not integers. Not even rationals
What happens with \(Z_n\) when π is really big?
When \(nββ,\) the distribution of \(Z_n=β X/\sqrt{π}\) will converge toa Normal distribution \[\lim_{nββ} Z_n βΌ Normal(0,1)\]
If \(X_i\) is a set of independent, identically distributed random variables, with expected value \[πΌX_i=ΞΌ\quad\text{for all }i\] and variance \[πX_i=Ο^2\quad\text{for all }i\] then, when \(n\) is large \[\lim_{nββ} \frac{\sum_i X_i-ΞΌ}{Ο\sqrt{π}} βΌ Normal(0,1)\]
If \(X_i\) is a set of independent, identically distributed random variables, with expected value \[πΌX_i=ΞΌ\quad\text{for all }i\] and variance \[πX_i=Ο^2\quad\text{for all }i\] then, when \(n\) is large \[\lim_{nββ} \frac{\sum_i X_i-ΞΌ}{\sqrt{π}} βΌ Normal(0, Ο^2)\]