People think that probabilities are about games
Instead they are really tools for thinking
Thinking about decisions when we have incomplete information
Thinking about the future
About the meaning of our experiment results
Travel insurance, Health insurance, Sigorta
maybe other toys that are easy to understand
These are just to have easy examples
Each device has a set of possible outcomes:
For example a die has the following outcomes
⚀ ⚁ ⚂ ⚃ ⚄ ⚅
🂡 🂢 🂣 🂤 🂥 🂦 🂧 🂨 🂩 🂪 🂫 🂭 🂮 🂱 🂲 🂳 🂴 🂵 🂶 🂷 🂸 🂹 🂺 🂻 🂽 🂾 🃁 🃂 🃃 🃄 🃅 🃆 🃇 🃈 🃉 🃊 🃋 🃍 🃎 🃑 🃒 🃓 🃔 🃕 🃖 🃗 🃘 🃙 🃚 🃛 🃝 🃞 🃟
♠︎♣︎♡♢
Four symbols can be used to represent DNA
A, C, G, T
Head, Tail also written as H, T
just throw the dice
c("H","T")
1:6
c("a","c","g","t")
LETTERS
letters
sample()
Try this
[1] "E" "I" "B" "Q" "A" "L" "H" "V" "K" "Z" "T"
[12] "U" "J" "D" "N" "P" "O" "G" "C" "F" "M" "X"
[23] "S" "W" "Y" "R"
[1] "G" "I" "K" "A" "X" "L" "D" "Q" "O" "S" "N"
[12] "P" "U" "R" "Y" "V" "E" "B" "H" "C" "M" "F"
[23] "J" "Z" "T" "W"
sample()
is shufflingThe output has the same elements of the input but in a different order
Each element appears only once
The order changes every time
To use sample()
we must give it a set of posssible outcomes
(In math this is called Ω, for short)
The result of sample()
is called outcome or realization
Try this
[1] "Q" "M" "V" "L" "O" "F" "W" "E" "Y" "J"
We get 10 letters
Some, but not all possible outcomes
Each outcome appears only once
Try this
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
Problem: We run out of outcomes. What can we do?
Try this
[1] "T" "A" "T" "C" "C" "C" "T" "G" "G" "G"
Each element can appear several times
Shuffle, take one, replace it on the set
sample()
with replace=TRUE
If there are more G anc C than A and T we can try
[1] "C" "A" "G" "G" "G" "C" "T" "A" "T" "G"
but this becomes hard if the proporitons are not a nice fraction
If there are more G anc C than A and T we can try
[1] "T" "C" "T" "C" "C" "C" "G" "T" "C" "T"
The input prob=
must be a vector of the same size as the set of outcomes
The sum of proportions must be 1
(the computer can do it for us)
Each experiment or random system may have some preferred outcomes
All outcomes are possible, but some can be probable
We want to know the probabilities of each outcome
We have three cases
and that is why we use replace=TRUE
A C G T
9 7 12 12
A C G T
8 5 13 14
A C G T
8 11 11 10
size
increases, frequencies increase
A C G T
91 103 93 113
A C G T
107 93 109 91
A C G T
98 94 98 110
size
, frequencies change less
A C G T
1021 976 977 1026
A C G T
1020 984 1022 974
A C G T
995 1052 986 967
size
A C G T
9991 10022 10004 9983
A C G T
10172 9980 9854 9994
A C G T
10074 9923 10012 9991
A C G T
99750 100468 100031 99751
A C G T
100002 99875 100449 99674
A C G T
99953 100445 99751 99851
If size
increases a lot, the relative frequencies are 1/4 each
Absolute frequency is how many times we see each value
The sum of all absolute frequencies is the Total number of cases
Relative frequency is the proportion of each value in the total
The sum of all relative frequencies is always 1.
a c g t
0.249790 0.249719 0.250288 0.250203
What will be each relative frequency when size
is BIG?
We will see that the relative frequency will converge to the probability
In complex systems we do not know each probability
But we can estimate it using the relative frequency
That is what we will do in this course