This time we are going to use our tools to understand how cells use different codons for the same amino-acid.
As you know the same amino-acid can be encoded by several codons. You
can see that on the genetic code or on the table
SEQINR.UTIL$CODON.AA
. Remember that you need to load the
seqinr
library to get this table.
- Write a function called
codons_in_gene
that takes agene
sequence (i.e. a vector of character) as input and returns a table of the frequency of each codon in the gene. If you do this:
library(seqinr)
<- read.fasta("https://anaraven.bitbucket.io/static/NC_000913.ffn")
genes codons_in_gene(genes[[1]])
you should get this:
aaa aac aca acc agc atc atg att cgc gcg ggc ggt tga
1 1 1 7 1 1 1 3 1 1 1 2 1
- Write a function called
codons_in_genome
that takes a list of genes (such as the output ofread.fasta
) and returns the a table of the total frequency of each codon on all the genes in the genome (a numeric vector with names). If you do this:
codons_in_genome(genes)
you should get this:
aaa aac aag aat aca acc acg act aga agc
44236 28319 13384 22756 8975 30972 18970 11577 2489 21131
agg agt ata atc atg att caa cac cag cat
1363 11322 5345 33331 36700 40171 20208 12814 38152 16937
cca ccc ccg cct cga cgc cgg cgt cta ctc
11058 7138 30969 9128 4523 29301 6983 27843 5072 14702
ctg ctt gaa gac gag gat gca gcc gcg gct
70390 14403 52330 25214 23456 42135 26535 33898 44900 19999
gga ggc ggg ggt gta gtc gtg gtt taa tac
10216 39366 14464 32655 14325 20227 34796 24031 2678 16079
tag tat tca tcc tcg tct tga tgc tgg tgt
287 21055 9154 11321 11747 10986 1178 8482 20060 6706
tta ttc ttg ttt
18085 21827 17992 29304
- We want to know the relative frequency of each codon among
the codons that code for the same amino-acid. For that, we need
a function called
codon_freq
that takes an amino-acidaa
(a single letter) and the output ofcodons_in_genome
(that we can calltotal_freq
), and returns the relative frequency of the codons corresponding to the amino-acidaa
. This output is a numeric vector with names.