March 8th, 2016
function
lapply
/sapply
cumsum
read.fasta
returns a list of vectors of textStatistics is a way to tell a story that makes sense of the data
In genomics, we look for biological sense
That story can be about the complete genome
or can be about some region of the genome
For each DNA strand we have \(\%A \approx \%T\) and \(\%G \approx \%C\)
This is because the substitution rate is presumably equal
Hence, the second parity rule only exists when there is no mutation or substitution
But the ratio of G over C is not uniform over the genome
Why?
GC skew changes sign at the boundaries of the two replichores
This corresponds to DNA replication origin or terminus
The replication origin is usually called ori.
##How DNA replicates {.no-gap}All our calculations are done using functions
What should be
Remember that an R function is defined like this
name <- function(input1, input2, ...) { Calculate return(output) }
The GC skew result should depend on:
These are the parameters of the function.
Do they have
How do we transform the input parameters into the output value?
We can use any R function available, such as
seq(from, to, by, length_out)
Task 1: write a gc.skew
function
We want to evaluate gc.skew
on different positions of the genome
pos <- seq(from=1, to=length(s[[1]]), by=10000)
How do we apply gc.skew
to each element of pos
?
Since we are working with a single sequence we can do
s1 <- s[[1]]
It is shorter and less error-prone
sapply
functionsapply(X, FUN, ...)
sapply
returns a vector of the same length as X
, each element of which is the result of applying FUN
to the corresponding element of X
Put it on your toolbox
s1[1:8]
[1] "a" "g" "c" "t" "t" "t" "t" "c"
sapply(s1[1:8], DNA.to.RNA)
a g c t t t t c "a" "g" "c" "u" "u" "u" "u" "c"
Can you guess the DNA.to.RNA
function?
In summary: inputs are vector and function, output is the result of function applied to each element of the input vector
DNA.to.RNA
DNA.to.RNA <- function(base) { if(base=="t") { return("u") } else { return(base) } }
There are other apply functions. Describe them
Write an HTML document (using Rmarkdown) describing the location of the replication origin (ori).
You can use the same function three times with different parameters to draw the GC skew of E.coli for windows of length 1k, 10K and 100K.
We need a summary of the previous classes, including the 1st semester