Work on this list every day without exception, at least 25 minutes without interruption. Use an alarm clock to know when to stop. Do not stop until the alarm rings. Always stop when the alarm rings and do something else for at least 5 minutes.
If you can, repeat this once every day. If you do it twice you have on hour every day, roughly equivalent to one day. That is exactly one more day than most people have studied so far, so doing this will be a huge advantage.
Things to avoid:
Don’t look the answers on the Internet. They are probably not there anyway, and you miss the chance of thinking on your own.
Don’t work alone. Find one or more friends and explain to them your ideas. Speaking loud helps to think. If you live alone in the top of a mountain and do not have any friends (I’m very sorry for you, really), you can use the course forum, make your own WhatsApp group of facebook page. Or send handwritten letters like people has been sending for the last 3000 years.
Don’t just read someone’s else answer. Be sure to understand the solution, try to do in a different way and write your own. Always write it, using your hands.
1. Computational thinking
1.1 Exploring vectors
You will program your own version of some standard functions using
only for()
, if()
and indices. All the
following functions receive a vector.
Please write your own version of the following functions:
vector_min(x)
, equivalent tomin(x)
. Returns the smallest element inx
.vector_max(x)
, equivalent tomax(x)
. Returns the largest element inx
.vector_which_min(x)
, equivalent towhich_min(x)
. Returns the index of the smallest element inx
.vector_which_max(x)
, equivalent towhich_max(x)
. Returns the index of the largest element inx
.vector_mean(x)
, equivalent tomean(x)
. Returns the average of all elements inx
.vector_cumsum(x)
, equivalent tocumsum(x)
. Returns a vector of the same length ofx
with the cumulative sum ofx
vector_diff(x)
, equivalent todiff()
. Returns a vector one element shorter thanx
with the difference between consecutive elements ofx
.vector_apply(x, f)
, equivalent tosapply(x, f)
. Inputs are vectorx
and functionf
. Returns a new vectory
of the same length ofx
wherey[i]
isf(x[i])
for alli
.
You can test your function with the following code.
<- sample(5:20, size=10, replace=TRUE)
x min(x)
vector_min(x)
The two results must be the same. Obviously, you have to replace
min
and vector_min
with the corresponding
functions.
1.2 Merging vectors
Please write a function called vector_merge(x, y)
that
receives two sorted vectors x
and
y
and returns a new vector with the elements of
x
and y
together sorted. The
output vector has size length(x)+length(y)
.
You must assume that each of the input vectors is already sorted.
For that you have to use three indices: i
,
j
, and k
; to point into x
,
y
and the output vector ans
. On each step you
have to compare x[i]
and y[j]
. If
x[i] < y[j]
then ans[k] <- x[i]
,
otherwise ans[k] <- y[j]
.
You have to increment i
or j
, and
k
carefully. To test your function, you can use this
code:
<- sample(letters)
a <- sort(a[1:13])
x <- sort(a[14:26])
y vector_merge(x, y)
The output must be a sorted alphabet.
1.3 Sorting
Please write a function called vector_mergesort(x)
that
takes a single vector x
and returns a new vector with the
same elements of x
but sorted from the smallest to the
largest.
To do so you have to use a recursive strategy as follows:
- If the input vector
x
has length 1, then it is already sorted. In that case the output is a copy ofx
- If the length of the input is larger than 1 then you split
x
in two parts. The new vectorx1
contains the first half ofx
, andx2
has the second half. - Be careful when
length(x)
is odd. - Now you have to sort
x1
andx2
by using the same functionvector_mergesort()
. Store the results inans1
andans2
. - Finally you have to merge
ans1
andans2
using the functionvector_merge()
of the previous exercise, and return the merged vector.
2. Random processes
Please write a function called
my_sample(x, size, replace, prob)
, equivalent to the functionsample(x, size, replace, prob)
, using onlysample.int(n, size, replace, prob)
Simulate an experiment with
N
independent dice. The result of the experiment is the sum of all dice.- Plot the histogram of the result for 100 replicas, for different
values of
N
. You can write a function for this. - Plot the average of the results of 100 replicas, depending on
different values of
N
such as 10, 1010, 2010, …, 2E4. - What is the relationship between the averages of the results and
N
? Build a linear model and explain the result. - Use the
quartile(x, ...)
function to find a 95% confidence interval for the result of the experiment.
- Plot the histogram of the result for 100 replicas, for different
values of
Simulate an experiment with
N
independent coins. Each side of the coins are labeled+1
and-1
. The result of the experiment is the sum of all coin labels.- Plot the histogram of the result for 100 replicas, for different
values of
N
. - Plot the average of the result depending on different values of
N
, like 10, 1010, 2010, …, 2E4. What is the relationship? - Write a function called
squared_vector(N)
takingN
as input, simulating 400 replicas, and returning a vector with the square of each replica. For example, if the replicas arec(1,-2,0,...,-1,3)
, the function must returnc(1,4,0,...,1,9)
.Hint: you can take the square before doing the replicas.
- Plot the mean of the output of
squared_vector(N)
versus N for different values ofN
, like 10, 1010, 2010, …, 2E4. - What is the relationship between the mean of the squares of the
results and
N
? Build a linear model and explain the result.
- Plot the histogram of the result for 100 replicas, for different
values of
How many times you have to throw a dice to get a 6? Give the average and a 95% confidence interval.For this and the following questions you can use ether
quantile(x)
or the formulas from Question 4.
How many times you have to throw a dice to get two consecutive 6? Give the average and a 95% confidence interval.
How many times you have to throw a dice to get two 6, consecutive or not? Give the average and a 95% confidence interval.
How many times you have to throw two dice to get a sum equal to 6? Give the average and a 95% confidence interval.
We have six lamps labeled 1 to 6. Initially they are all turned off. You trow a dice and get a number
x
. Then you switch the lamp that has the labelx
.How many times you have to trow the dice until all six lamps are turned on? Give a range that is valid at least 95% of times.
What is the effect of the read length in the number of contigs? Assume shotgun assembly of a genome of size 1E6, and make a plot for different read lengths and number of reads.
3. Hypothesis testing: Blind test of cola normal v/s zero
We want to know if you can taste the difference between cola normal
and sugarless. To test this, we prepare 8 cups that look identical. Four
of the cups are filled with normal cola, the other four cups have cola
zero. The 8 cups are randomly shuffled using sample.int(8)
.
We write the shuffling order in a paper and hide it in an envelop that
you cannot see.
You test all of them and you write which ones you believe are cola normal and which ones are zero. For example you can write that cups 2,3,5 and 7 have cola zero. Then we open the envelop and compare your results to the original order, and we find that you guessed correctly all of them. Then we have two possible explanations:
- hypothesis zero: there is no difference between cola and zero, you just chose randomly and were lucky
- hypothesis one: you can tell the difference and you guess correctly
What is the probability of choosing correctly just by luck (i.e. under hypothesis zero)?
4. Theory: event frequency v/s event probability
- You want to do an experiment where the probability of an event is 0.70. How many replicas you need to guarantee that the relative frequency on the event in the experiment is between 0.65 and 0.75 at least 95% of the time? What is the formula to answer that question?
- You simulated a process with 100 replicas. The relative frequency of the event is 0.7. What is the 95% confidence interval for the real probability? What is the formula to answer that question?