The homework of this week aims to replicate the tables and graphics of the website Comparative Genometrics, which has precomputed statistics for the DNA sequences of several thousands of Bacteria.
Please take a look at the page of E.coli K-12. in the Comparative Genometrics. You can see that the graphics are made based on the table CP009685.txt.
Please write the R code to read the genome of E.coli and
produce a table equivalent to CP009685.txt.
You may see that the step size is 1000 nt, the column
pos
is the average of start
and
end
, the columns nA
, nC
,
nG
and nT
are the output of
table()
, and GCsk
and TAsk
are
very easy to calculate.
You have to research and understand how to make the columns
cGCsk
and cTAsk
. The function
cumsum()
may be useful, but you can do the same with a
for
loop.
iyi çalismalar
PS. Can you make a function to produce the reverse complement of a DNA sequence?