To explain question 2.4 I will show an example. First, please take a look at the description from the original question. Here we have numbered the steps, so we can talk about them.
- For each amino-acid AA:
- Find the codons that encode AA it and store them in a
vector called
codons_for_aa
. - Find the maximum of
total_codon_count
among the codons incodons_for_aa
. Look only at these codons, ignore the rest. Store this maximum inmax_count_for_aa
. - For each codon in
codons_for_aa
:- Find the value of
total_codon_count
for that particular codon. - Divide it by
max_count_for_aa
. - Calculate the logarithm of the result of the last division, and
store it in
codon_adaptness
using the codon as the index. - Repeat for each codon in
codons_for_aa
.
- Find the value of
- Repeat for each amino acid.
- Find the codons that encode AA it and store them in a
vector called
If you translate this description from English to R, then you will have the answer.
From this description you can see that there are two
for()
loops. They are nested, one inside the other.
Remember that in this case you must use different names of variables on
each loop. For example i
and j
—or any other
word. You cannot use i
and i
, because the
computer will be confused. It is a good idea to choose names that remind
you of the things you are looking at (amino-acid, codon).
The key of this question is to use genetic_code
as a
tool, not as an problem. Some people tries to use unlist()
—a command that we never used in classes— to transform the list into a
vector. But in that case you get all the codons, not only the
ones for a specific amino acid. Too complicated, and wrong.
The vector names(genetic_code)
has all the amino-acid’s
names. Each element of the list genetic_code
is a vector,
containing only the codons of the corresponding amino-acid. If
aa
is the name of an amino-acid, then
genetic_code[[aa]]
will be a vector containing the codons
that encode aa
.
The first loop must take each amino-acid name, one by one. It will go
through "Ala"
, then "Arg"
, then
"Asn"
, up to "Val"
.
The vector total_codon_count
can be indexed using the
codon names. Thus, if you use codons_for_aa
as an index for
total_codon_count
, then you will be working only among
the codons in codons_for_aa
, as requested.
Let’s follow the instructions step by step.
Step 1 says “Find the codons that encode one amino-acid”.
For example, let’s say that the main for()
loop has taken
the amino-acid name "Val"
. Using this name as an index for
genetic_code
, we can find that the codons for this
amino-acid are "gta", "gtc", "gtg", "gtt"
. We store these
values in the vector codons_for_aa
. Obviously we do not do
this by hand. Instead you tell the computer to do it for us.
Step 2 says “Find the maximum of total_codon_count
among the codons in codons_for_aa
.” Let’s say that
there are 560 valines in total (in all the genes). We use
codons_for_aa
as index for total_codon_count
and get the count of each codon:
gta gtc gtg gtt
410 110 10 30
This small vector is what we call “only the codons of valine”. We ignore for now the rest of the codons, since in this part we are working only with valine.
The maximum value of this small vector is 410. We store that value in
max_count_for_aa
. We finished step 2.
In step 3 we have a second for()
loop, going through the
codons in codons_for_aa
one by one. Let’s say that the
codon is "gtc"
(we work on the other codons later). In step
3.1 we use this codon as an index to look inside
total_codon_count
and we find the value 110. That is the
count of "gtc"
.
In step 3.2 we divide that value by max_count_for_aa
. We
get 110/410 or in other words 0.2682927. In step 3.3 we calculate
log(110/410)
and we get -1.315677. We store this calculated
value into the corresponding element of vector
codon_adaptness
. We have to be careful to use the correct
index.
At the end, as usual, it is all about using the correct index in the correct order.
We do this for each codon of each amino-acid in
genetic_code
.
(Notice that this is not the relative frequency. We are calculating something else here, so the formula is something else.)