sample()
functionWe will use the function
x
is a vector of possible outcomessize
is the number of elements to takereplace
is TRUE
to allow repeated outcomesprob
is the proportion of each possible outcome in a very large populationsample(x)
just shuffles x
sample(x, size)
first shuffles x
, then takes size
elementssample(x, size, replace=TRUE)
allows us to have size
larger than length(x)
sample(x, size, replace=TRUE, prob)
changes the probability of each possible outcomeMaking experiments in the computer
First, let’s give a name to the possible outcomes
The easiest way to get a simulated DNA sequence is
[1] "G" "G" "G" "C" "G"
We want to see what happens with our experiment if we could replicate it several times
The easiest way is to use replicate()
[,1] [,2] [,3]
[1,] "C" "T" "G"
[2,] "C" "C" "T"
[3,] "C" "C" "A"
[4,] "G" "A" "G"
[5,] "A" "C" "G"
replicate
, not rep
eatThis function can be confused with rep()
With replicate
, the function sample
is used 3 independent times
[,1] [,2] [,3]
[1,] "A" "G" "A"
[2,] "T" "T" "C"
[3,] "A" "C" "G"
[4,] "A" "G" "T"
[5,] "A" "C" "C"
replicate
, not rep
eatWith rep
, the function sample
is used one time, and the result is copied 3 times. It is not random
[1] "A" "G" "G" "A" "T" "A" "G" "G" "A" "T" "A" "G" "G" "A" "T"
Also, notice that the number of replicas/repetitions is written in different places
replicate
output can be a listBy default, the output is simplified into a matrix. To get a list instead, we use
[[1]]
[1] "G" "T" "C" "A" "G"
[[2]]
[1] "A" "A" "C" "G" "G"
[[3]]
[1] "T" "A" "G" "A" "G"
The GC content of C.rudii is 17%. To simulate its DNA, we need to indicate the proportion of each nucleotide.
The vector prob
must have the same length as DNA_alphabet
[1] "A" "A" "T" "G" "A" "T" "A" "A" "A" "T" "T" "T" "T" "T" "T" "T"
[17] "T" "T" "A" "T" "A" "A" "T" "A" "A" "A" "T" "T" "T" "A" "T" "C"
[33] "G" "G" "A" "A" "T" "A" "T" "A" "A" "T" "A" "T" "T" "T" "A" "T"
[49] "C" "T" "G" "G" "T" "A" "A" "C" "A" "A" "C" "T" "A" "T" "C" "T"
[65] "A" "T" "A" "A" "A" "A" "C" "A" "A" "A" "T" "T" "G" "T" "T" "T"
[81] "T" "T" "T" "C" "T" "A" "A" "A" "A" "G" "A" "A" "A" "A" "T" "G"
[97] "T" "A" "A" "A"
We call “coin” any experiment that has two possible outcomes
A fair coin has 50% probability for each outcome
The interesting case is when the probabilities are different
Let’s try a 10% coin
[1] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T"
and an 80% coin
[1] "H" "H" "T" "H" "H" "H" "H" "H" "T" "H" "T" "H" "H" "H" "H"
Remember to write all probabilities
If each coin is a test, we want to know how many tests were successful.
[1] 5
This simulates, for example, growing 20 plants and counting how many of them survive
We can wrap all in a function like this
num_successes <- function(size, p) {
coins <- sample(c("H","T"), replace=TRUE, size=size, prob=c(p, 1-p))
return(sum(coins=="H"))
}
or, better
num_successes <- function(size, p) {
coins <- sample(c(1, 0), replace=TRUE, size=size, prob=c(p, 1-p))
return(sum(coins))
}
(notice that you must say size=size
)
[1] 6 3 9 5 8 6 5 5 7 7 4 5 6 6 5 5 9 6 9 4 10
[22] 9 9 7 2 5 8 3 6 3 6 6 6 5 7 7 7 3 7 5 4 2
[43] 7 7 9 5 7 4 5 7 8 4 5 7 8 6 3 3 4 3 9 7 9
[64] 12 8 5 4 5 5 5 5 9 8 3 9 6 8 9 5 5 5 9 6 5
[85] 6 4 3 5 5 3 5 7 8 6 4 9 5 5 8 9 3 7 7 5 6
[106] 2 7 5 6 8 4 5 6 4 8 5 10 4 5 7 5 4 7 4 7 9
[127] 8 4 7 6 10 7 7 6 8 8 4 6 5 6 8 7 8 7 6 7 5
[148] 4 6 6 6 6 6 6 5 6 6 8 6 2 7 5 1 5 8 3 8 7
[169] 10 3 4 3 8 7 6 7 6 8 10 6 7 5 8 10 2 6 8 6 7
[190] 5 8 8 4 7 5 8 5 4 6 5 2 4 7 7 3 6 2 4 7 6
[211] 8 9 4 8 1 5 7 2 10 4 10 7 9 2 5 7 7 8 6 6 4
[232] 3 5 4 6 3 7 4 6 7 9 9 7 6 6 7 9 5 4 10 8 6
[253] 3 3 6 4 1 6 9 4 3 8 6 7 4 6 6 7 8 1 6 6 5
[274] 4 4 7 6 6 6 5 6 5 7 5 10 6 7 8 6 7 9 8 4 7
[295] 3 4 8 8 6 9 8 4 2 3 2 4 4 11 2 5 4 10 8 5 9
[316] 4 6 4 6 6 7 6 8 7 9 6 5 5 6 5 3 4 6 4 8 9
[337] 5 3 5 6 8 6 5 6 8 1 7 9 5 8 7 4 5 7 8 3 10
[358] 7 4 7 2 6 6 7 9 7 5 3 5 3 3 5 5 4 8 3 4 5
[379] 7 6 9 4 10 7 7 6 8 5 6 6 8 7 3 4 7 6 4 8 7
[400] 8 4 4 6 8 6 4 4 5 4 5 5 5 4 4 6 4 8 8 7 7
[421] 6 7 5 7 4 3 5 7 5 5 6 3 6 5 5 5 6 7 5 7 7
[442] 7 5 9 8 5 4 8 5 5 4 4 10 4 3 5 8 6 5 7 4 3
[463] 9 8 6 4 6 4 7 5 3 5 8 4 5 7 7 7 7 6 5 6 6
[484] 7 6 9 9 4 6 7 9 2 7 7 3 9 4 9 4 7 4 3 4 7
[505] 6 6 9 4 5 10 4 6 4 3 5 8 6 4 4 7 9 10 3 6 7
[526] 5 4 2 4 9 6 7 10 7 7 7 7 7 6 5 8 7 2 8 5 6
[547] 5 9 4 9 6 3 8 6 6 6 9 7 7 4 8 6 9 7 8 10 5
[568] 7 5 7 8 9 5 4 5 6 3 6 6 7 4 8 7 8 9 4 4 6
[589] 4 9 6 6 7 4 4 6 5 7 8 5 5 2 3 10 7 7 7 6 8
[610] 6 5 6 7 7 7 7 6 8 5 5 3 6 5 7 5 6 4 6 9 4
[631] 5 6 6 8 7 11 4 5 7 4 9 5 6 11 2 6 8 6 7 3 6
[652] 9 4 5 5 7 5 6 4 6 8 2 5 4 5 5 5 4 5 3 7 7
[673] 3 6 5 6 8 10 8 3 7 8 3 3 9 3 6 7 8 3 9 8 11
[694] 8 8 7 4 2 5 5 5 4 6 5 5 5 9 6 5 7 4 5 10 11
[715] 5 9 3 9 4 5 9 6 4 7 4 3 6 7 3 8 8 4 7 6 5
[736] 6 4 6 7 6 5 2 8 4 4 7 5 5 9 7 5 4 6 5 7 8
[757] 5 8 5 9 6 4 5 3 6 6 9 3 6 4 7 5 2 5 9 5 5
[778] 6 7 2 8 2 5 4 8 4 6 7 7 8 5 7 4 9 3 5 4 9
[799] 3 8 7 7 7 8 6 5 5 10 5 3 4 6 7 6 10 5 7 7 7
[820] 4 9 8 9 10 5 3 6 5 9 8 3 10 7 7 4 7 6 8 8 7
[841] 5 7 6 2 10 4 6 8 10 5 5 6 6 9 8 8 2 3 4 4 10
[862] 9 4 4 4 7 3 7 6 6 6 7 7 10 1 7 9 8 7 6 5 8
[883] 8 3 6 5 3 5 4 5 6 6 6 5 9 9 3 5 7 6 5 3 8
[904] 4 8 5 6 4 5 4 6 6 10 4 3 4 3 3 10 3 5 4 2 4
[925] 3 4 3 6 5 5 7 11 7 5 7 6 9 1 9 5 3 10 4 5 8
[946] 8 6 3 4 5 5 6 4 8 8 9 6 7 5 3 6 3 8 5 3 5
[967] 3 8 4 7 8 6 2 5 7 6 5 6 9 7 4 6 4 6 6 7 7
[988] 3 6 6 5 8 2 5 7 4 4 6 6 3
Counting the number of successful tests is one of the most common cases in experimental sciences
Thus, there is a better way to simulate this case
or better
[1] 4 9 8 8 6 4 5 7 4 6 7 7 13 5 8 5 4 8 8 6 7
[22] 7 4 5 5 1 10 5 8 6 4 10 8 6 6 2 9 5 5 4 8 7
[43] 4 5 8 7 6 8 6 4 7 8 6 4 3 6 5 5 5 2 7 5 7
[64] 2 5 3 4 6 11 7 7 7 4 5 3 7 3 8 5 5 6 7 8 3
[85] 6 5 3 6 8 7 6 4 5 5 6 6 6 8 6 8 4 6 5 7 5
[106] 7 4 4 10 11 2 6 4 7 3 3 6 5 4 6 8 8 7 7 7 6
[127] 6 7 4 7 3 8 5 8 5 7 5 4 6 5 7 5 3 9 8 5 5
[148] 5 8 10 2 9 7 6 5 7 4 4 2 7 6 5 3 6 6 4 4 2
[169] 4 3 6 8 5 3 5 5 6 4 5 6 8 7 9 5 4 4 6 7 6
[190] 6 5 9 4 6 9 8 5 5 3 8 7 6 5 5 6 6 6 6 4 10
[211] 9 8 5 5 3 6 6 8 7 3 6 6 6 6 4 8 4 7 5 8 4
[232] 7 6 8 7 11 5 6 10 8 7 8 8 3 5 6 8 5 2 4 5 4
[253] 3 5 2 5 5 8 5 7 5 6 4 6 5 10 7 3 4 6 3 7 7
[274] 6 7 5 10 7 9 6 5 5 4 6 5 6 7 6 7 8 7 8 7 8
[295] 4 5 5 8 4 6 7 8 8 7 4 5 6 6 7 5 4 5 2 10 2
[316] 9 2 5 4 5 9 9 7 6 5 7 5 6 5 8 6 7 6 5 6 8
[337] 4 8 4 7 4 9 7 5 5 6 6 5 9 8 6 6 4 5 9 1 8
[358] 6 4 6 5 12 6 4 8 8 3 7 10 6 4 7 5 8 8 6 4 5
[379] 6 5 4 2 7 5 8 6 6 9 6 7 7 8 7 4 6 7 7 4 5
[400] 7 6 10 7 3 8 6 6 6 7 7 7 10 7 2 5 8 9 5 4 6
[421] 5 8 8 9 7 9 6 10 5 2 8 5 6 5 5 7 3 4 10 5 4
[442] 6 1 2 2 7 9 6 6 8 8 4 2 6 5 8 4 2 7 8 4 6
[463] 8 3 4 4 6 4 7 6 4 9 4 6 6 5 8 5 5 7 5 6 2
[484] 7 8 6 4 6 8 4 7 4 7 4 1 6 3 6 6 11 8 8 7 8
[505] 6 7 7 5 7 7 6 6 5 4 4 7 3 4 7 11 4 5 8 6 6
[526] 4 6 3 5 7 6 7 3 9 10 4 7 5 6 11 8 8 5 6 7 5
[547] 4 5 9 4 8 8 4 6 8 5 8 4 8 10 8 7 5 5 7 6 10
[568] 4 9 5 6 4 4 8 4 8 9 8 10 5 8 5 7 0 9 7 6 4
[589] 7 10 8 5 7 6 5 8 0 5 7 7 4 8 5 6 9 10 12 5 6
[610] 8 3 7 7 6 4 9 7 6 4 6 5 6 5 7 8 4 6 3 5 4
[631] 7 5 8 8 6 7 9 6 8 8 7 8 6 3 8 6 5 6 5 9 5
[652] 5 4 6 9 8 6 9 10 8 4 6 7 7 4 2 4 3 7 3 7 8
[673] 5 8 6 8 5 7 4 6 6 7 4 7 9 9 6 4 4 7 10 3 7
[694] 2 7 3 7 2 5 4 6 3 8 10 8 8 5 5 3 5 1 9 7 6
[715] 7 4 2 6 10 6 9 6 3 11 4 8 2 7 9 5 6 9 7 8 6
[736] 3 6 5 4 9 1 2 8 5 7 6 3 4 6 9 6 5 9 5 9 3
[757] 7 6 3 5 4 5 4 5 7 8 4 5 5 5 5 6 4 7 6 2 3
[778] 3 4 6 8 8 6 6 6 5 6 6 4 8 7 8 5 4 6 8 4 5
[799] 6 7 8 6 7 6 8 7 3 11 7 6 8 4 4 8 5 5 8 6 6
[820] 4 7 7 6 3 8 4 8 7 8 9 9 8 6 8 9 3 7 7 7 4
[841] 8 6 5 9 7 3 7 8 8 2 5 3 3 6 4 9 5 5 9 4 5
[862] 6 8 8 8 8 10 6 4 8 5 6 7 6 4 6 6 8 5 4 4 4
[883] 7 5 7 6 7 7 5 7 5 10 8 5 2 10 4 9 8 8 10 7 9
[904] 6 9 2 4 3 8 4 6 6 8 9 6 4 4 7 7 2 6 11 5 6
[925] 4 2 8 8 8 9 5 8 9 8 6 5 8 9 6 5 5 2 4 10 5
[946] 8 7 7 3 3 9 8 5 6 5 8 6 10 8 5 8 7 5 9 5 8
[967] 3 6 1 5 6 7 4 7 4 4 3 3 6 7 8 6 8 4 7 6 5
[988] 8 6 7 3 6 1 5 8 6 6 5 8 6
DNA and coins are random systems that produce symbols
Other cases, like num_sucesses
, produce random numbers
When the outcomes are numbers, we say that the result is a random variable
There are several functions to generate random variables with different distributions
So far we used sample
to mix integer numbers, and rbinom
to simulate number of successes
We can also generate real random numbers, with different distributions
In this course we will use
We use this method when we only know the range where the random variable can be
To generate 12 random real numbers between 0 and 1, we use
[1] 0.43632997 0.85091620 0.65410589 0.26664259 0.09386964 0.83004632
[7] 0.23017404 0.81131515 0.82213878 0.76379385 0.02908118 0.74287759
We can choose any range for the numbers
[1] 17.65299 17.58306 20.47430 21.57560 15.89224 20.56096 17.84828
[8] 15.04770 22.23678 20.03353 15.25201 20.63179
We use this method when we know that the random variables are located around a mean
and have a dispersion sd
To generate 14 random numbers with a Normal distribution centered on 40 with a dispersion of 5, we use
[1] 38.51761 41.32713 32.43647 40.26195 45.84640 40.65946 37.46025
[8] 47.69482 40.99083 34.53998 42.94042 36.80723 36.84783 42.61253
We will study this case in detail, but later.
sample(x,…)
can mix any set x
of possible outcomesrbinom(n, p, size)
gives the number of successes in size
trials, n
timesrunif(n, min, max)
gives n
real numbers between min
and max
rnorm(n, mean, sd)
gives n
real numbers centered around mean
, with dispersion sd