Last class we showed that the sum of all outcomes’ probabilities is 1 \[ℙ(\{a_1\}) + ℙ(\{a_2\}) + … + ℙ(\{a_n\})=1\] If we know these values, we can calculate everything.
The set of values for all \(i\) \[p(a_i) = ℙ(\{a_i\})= ℙ(\textrm{outcome is exactly }a_i)\] is called the distribution of the probability.
This definition makes sense only if we agree on what are all the possible outcomes.
In other words, we must agree on what is \(Ω\)
Then the probability distribution is a function \[p: Ω → [0,1]\]
Notice that there may be more than one way to define \(Ω\)
The easiest case to study is mixing a set of cards
We shuffle cards several times until we cannot longer know which card will come first
We are interested in the event “the next card will be green”
Let’s assume that we know how many cards of each color are in the deck
There are \(n_c\) cards of color \(c\in\){“red”,“green”,“blue”, “yellow”}
There are \(N=∑ n_c\) cards in total
If we do not have any solid reason to expect any particular order of cards, then each individual card has the same probability \(1/N\)
The probability of “first color \(c\)” is \[ℙ(\textrm{color is }c)=\frac{n_c}{N}\]
We will continue drawing cards, so the proportions will change
Let’s say we got color \(c_1\) in the first draw
Now we have \(N-1\) cards in total, and there are \(n_{c_1}-1\) cards of color \(c_1\)
The probability of “second color \(c\)” is
\[ℙ(\textrm{second color is }c|\textrm{first color is }c_1)=\begin{cases} \frac{n_c}{N-1} &\textrm{if }c≠c_1\\ \frac{n_c-1}{N-1} &\textrm{if }c=c_1 \end{cases}\]
It gets complicated
The formula applies when our measurement changes the experiment
There are two exceptions where proportions do not change
In practice, we are often sample from a very large population and we model it as a sampling with replacement
If we replace the card into the deck after we see it, we will have
\[ℙ(\textrm{second color is }c|\textrm{first color is }c_1)=ℙ(\textrm{color is }c)=\frac{n_c}{N}\]
Notice that this means that the second result is independent of the first result, and so on
Moreover, the distribution is identical in each case
This is a very important case, and we give it a name
Independent, Identically Distributed (i.i.d.)
All can be reduced to the value \[p=ℙ(\text{'Head'})\]
We say that the probability distribution of the coin depends on the parameter \(p\)
(In math this is called a Bernoulli distribution)
What is the probability that we get \(k\) heads if we throw \(N\) coins?
This happens to be one of the most useful cases for us
Let’s assume that all coins are i.i.d. with \(ℙ(\text{'Head'})=p\)
To simplify, we will call \(ℙ(\text{'Tail'})=q\) so \(p+q=1\)
To understand this case, we should start with small values of \(N\)
we get \[1⋅ q^2,\quad 2⋅ p q,\quad 1⋅ p^2\]
we get \[1⋅ q^3,\quad 3⋅ pq^2,\quad 3⋅ p^2q,\quad 1⋅ p^3\]
The rule of combinations are the same as in the binomial theorem
We get \[ℙ(k\textrm{ Heads in }N\textrm{ coins})= \binom{N}{k} p^k q^{(N-k)}\]
The numbers \(\binom{N}{k}\) are found in Pascal’s triangle
One way to remember it is to use the formula \[(p+q)^N =\sum_{k=0}^N \binom{N}{k} p^k q^{(N-k)}\]
This is why we call it Binomial distribution
The most important applications of probabilities are when the outcomes are numbers
More in general, we care about numbers that depend on the experiment outcome
If the outcomes are numbers, we can use them in formulas
For example, if coins are “Heads \(↦1\) and Tails \(↦0\)”, then \[ℙ(k\textrm{ Heads in }N\textrm{ coins})\] is the same as \[ℙ\left(\sum_{i=0}^N X_i=k | X_i \textrm{ are iid coins}\right)\]
In everyday life, if \(𝐱 = (x_1,…,x_N)\) we have \[\text{mean}(𝐱)=\bar{\mathbf x} = \frac{1}{n}\sum_i x_i\]
Now, if we count how many of each different value are there \[n(x) = \textrm{number of times that }(x_i=x)\] Then we can write \[\text{mean}(𝐱)=\bar{\mathbf x} =\sum_x x \frac{n(x)}{N}\]
In other words, to calculate the average we need to know the proportions
For any random variable \(X\) we define the expected value (also called mean value) of \(X\) as its average over the population \[𝔼X=\sum x\, ℙ(X=x)\] Notice that \(X\) is a random variable but \(𝔼X\) is not.
Generalizing, we can get the expected value of any function of \(X\) \[𝔼\,f(X)=\sum f(x)\, ℙ(X=x)\]
If \(X\) and \(Y\) are two random variables, and \(\alpha\) is a real number, then
\[𝔼(X + Y)=𝔼X + 𝔼Y\] \[𝔼(α X)=α\, 𝔼X\]
So, if \(α\) and \(β\) are real numbers, then
\[𝔼(α X +\beta Y)=α\, 𝔼X +β\, 𝔼Y\]
Exercise: prove it yourself
The variance of the population is defined with the same idea as the sample variance \[𝕍 X=𝔼(X-𝔼X)^2\] Notice that the variance has squared units
In most cases it is more comfortable to work with the standard deviation \(\sigma=\sqrt{𝕍X}.\)
In that case the population variance can be written as \(\sigma^2\)
We can rewrite the variance of the population with a simpler formula: \[𝕍X=𝔼(X-𝔼X)^2=𝔼(X^2)-(𝔼X)^2\] because \[𝔼(X-𝔼X)^2=𝔼(X^2-2X𝔼X+(𝔼X)^2)\\=𝔼(X^2)-2𝔼(X𝔼X)+𝔼(𝔼X)^2\] but \(𝔼X\) is a non-random number, so \(𝔼(X𝔼X)=(𝔼X)^2\) and \(𝔼(𝔼X)^2=(𝔼X)^2\)
if \(X\) and \(Y\) are two independent random variables, and \(\alpha\) is a real number, then
To prove the first equation we use that \(𝔼(XY)=𝔼X\,𝔼Y,\) which is true when \(X\) is independent of \(Y\)