March 7, 2018
“Perfect is the Enemy of Good”
Do something bad, then improve it
Then improve it again
Like Michelangelo sculptures
We represent the number of pairs of rabbits on month n
with the function fib(n)
.
This function has the following rules:
fib(0) = 0
fib(1) = 1
fib(n) = fib(n-1) + fib(n-2)
Your task is to write a recursive function in R that has input n
and output fib(n)
.
Recursive means that the function calls itself
It is like the factorial example
facto <- function(n) { if(n <= 1) { ans <- 1 } else { m <- n-1 ans <- n * facto(m) } return(ans) }
facto
into fib
fib(0) = 0
means if(n==0) {ans <- 0}
fib(1) = 1
means if(n==1) {ans <- 1}
# author: I CAN NOT DO THİS fib <- function(n) { if(n==0) { ans <- 0 if(n==1) { ans <- 1 } else { ans <- fib(n-1) + fib(n-2) } return(ans) }
Indentation: use spaces to see the structure
fib <- function(n) { if(n==0) { ans <- 0 if(n==1) { ans <- 1 } else { ans <- fib(n-1) + fib(n-2) } return(ans) }
Now you can see that you are missing a {
before the second if
, and a }
at the end
fib <- function(n) { if(n==0) { ans <- 0 } if(n==1) { ans <- 1 } else { ans <- fib(n-1) + fib(n-2) } return(ans) }
Use the debugger to see what happens when n
is 0
(use h
key)
fib <- function(n) { if(n <= 0){ ans <- 0 } else if(n <= 1) { ans <- 1 } else { ans <- fib(n-1) + fib(n-2) } return(ans) }
(use h
key)
n<=0
is a comparison. Result is logicThis: n<-0
is an assignment. Result is numeric
n==0
is a comparison. Result is logicThis: n=0
is an assignment. Result is numeric
The cases n==0
and n==1
can be put together
fib <- function(n) { if(n <= 1){ ans <- n } else { ans <- fib(n-1) + fib(n-2) } return(ans) }
This is optional, but nice
(use h
key)
turtle_right(90) turtle_forward(4.5) turtle_left(90) turtle_forward(9) turtle_left(90) turtle_forward(9) turtle_left(90) turtle_forward(9) turtle_left(90) turtle_forward(4.5) turtle_right(90)
# upper body turtle_forward(9) turtle_right(180) turtle_forward(3) # arms turtle_left(90) turtle_forward(9) turtle_right(90) turtle_right(90) turtle_forward(18) turtle_left(180) turtle_forward(9) turtle_left(90)
turtle_forward(10) turtle_forward(5) # first leg turtle_right(40) turtle_forward(10) turtle_forward(3) turtle_left(90) turtle_forward(3) turtle_left(180) turtle_forward(3) turtle_right(40) turtle_right(50) turtle_forward(13)
# get back to initial angle turtle_right(40) turtle_right(90) turtle_left(10) turtle_right(20) # second leg turtle_left(5) turtle_left(40) turtle_forward(13) turtle_left(90) turtle_forward(3) turtle_hide()
turtle_right(90) turtle_forward(4.5) turtle_left(90) turtle_forward(9) turtle_left(90) turtle_forward(9) turtle_left(90) turtle_forward(9) turtle_left(90) turtle_forward(4.5) turtle_right(90)
turtle_right(90) turtle_forward(4.5) for(i in 1:3) { turtle_left(90) turtle_forward(9) } turtle_left(90) turtle_forward(4.5) turtle_right(90)
(use h
key)
turtle_right(90) turtle_forward(4.5) for(i in 1:3) { turtle_left(90) turtle_forward(9) } turtle_left(90) turtle_forward(4.5)
Let’s say size=9
(for now)
turtle_right(90) turtle_forward(size/2) for(i in 1:3) { turtle_left(90) turtle_forward(size) } turtle_left(90) turtle_forward(size/2)
(use h
key)
# upper body turtle_forward(9) turtle_right(180) turtle_forward(3) # arms turtle_left(90) turtle_forward(9) turtle_right(90) turtle_right(90) turtle_forward(18) turtle_left(180) turtle_forward(9) turtle_left(90)
# upper body turtle_forward(size*2/3) # arms turtle_right(90) turtle_forward(size) turtle_right(180) turtle_forward(size*2) turtle_left(180) turtle_forward(size) turtle_left(90)
turtle_forward(10) turtle_forward(5) # first leg turtle_right(40) turtle_forward(10) turtle_forward(3) turtle_left(90) turtle_forward(3) turtle_left(180) turtle_forward(3) turtle_right(40) turtle_right(50) turtle_forward(13)
turtle_forward(15) # first leg turtle_right(40) turtle_forward(13) turtle_left(90) turtle_forward(3) turtle_left(180) turtle_forward(3) turtle_right(90) turtle_forward(13)
turtle_forward(15) turtle_right(40) # first leg turtle_forward(13) turtle_left(90) turtle_forward(3) turtle_left(180) turtle_forward(3) turtle_right(90) turtle_forward(13)
turtle_forward(size*15/9) turtle_right(40) # first leg turtle_forward(size*13/9) turtle_left(90) turtle_forward(size/3) turtle_left(180) turtle_forward(size/3) turtle_right(90) turtle_forward(size*13/9)
# get back to initial angle turtle_right(40) turtle_right(90) turtle_left(10) turtle_right(20) # first leg turtle_left(5) turtle_left(40) turtle_forward(13) turtle_left(90) turtle_forward(3)
\(40+90-10+20 = 140\)
# get back to initial angle turtle_right(140) # first leg turtle_left(45) turtle_forward(13) turtle_left(90) turtle_forward(3)
draw_person <- function(size) { draw_head(size*1.2) turtle_left(180) turtle_forward(size) turtle_left(90) draw_arm(size*1.5) turtle_left(180) draw_arm(size*1.5) turtle_left(90) turtle_forward(size*2) turtle_left(20) draw_leg(size*2) turtle_right(40) draw_leg(size*2) }
This is the main function.
It is my part.
I can change it.
You should not change it.
Instead, you have to provide functions for head, arms and legs.
draw_head <- function(size) { } draw_arm <- function(size) { } draw_leg <- function(size) { }
Each part commits to do something
Each part makes a promise
We promise to leave the turtle in the same position as we received it
You should always start with the easy parts
The easy part may not be the first part
We receive the turtle pointing in the arm direction
We must leave the turtle in the same place and same angle
draw_arm <- function(size) { turtle_forward(size) turtle_backward(size) }
draw_arm <- function(size) { turtle_forward(size) turtle_backward(size) }
draw_leg <- function(size) { turtle_forward(size) turtle_left(90) turtle_forward(size/3) turtle_backward(size/3) turtle_right(90) turtle_backward(size) }
Notice that to undo something you have to undo each part in reverse order
You put your socks first, then your shoes
To undo you first “un-put” your shoes, then your socks
In general
\[Undo(A,B,C) = Undo(C), Undo(B), Undo(A)\]
Each function has a separate environment with its own variables
We can store position and angle at the start, and reset them at the end
draw_leg <- function(size) { old_pos <- turtle_getpos() old_angle <- turtle_getangle() turtle_forward(size) turtle_left(90) turtle_forward(size/3) turtle_setangle(old_angle) turtle_setpos(old_pos[1], old_pos[2]) }
This is useful when the drawing is complex. Keep it in mind
In this case we separated the big problem on independent parts. That allows us to change each part without affecting the others, as long as we keep our promises.
For example, you can change the position of the hands, the shape of the head and hands, an others. The code is in stick-person-2.R.
Candidatus Carsonella ruddii is a obligate symbiont of Pachpsylla venusta.
There is much controversy over whether this is a living cell or simply an organelle as it is missing genes needed for living independently.
Published as “The 160-kilobase genome of the bacterial endosymbiont Carsonella.” https://www.ncbi.nlm.nih.gov/pubmed/17038615
ATGAATACTATATTTTCAAGAATAACACCATTAGGAAATGGTACGTTATGTGTTATAAGAATTTCTGGAA AAAATGTAAAATTTTTAATACAAAAAATTGTAAAAAAAAATATAAAAGAAAAAATAGCTACTTTTTCTAA ATTATTTTTAGATAAAGAATGTGTAGATTATGCAATGATTATTTTTTTTAAAAAACCAAATACGTTCACT GGAGAAGATATAATCGAATTTCATATTCACAATAATGAAACTATTGTAAAAAAAATAATTAATTATTTAT TATTAAATAAAGCAAGATTTGCAAAAGCTGGCGAATTTTTAGAAAGACGATATTTAAATGGAAAAATTTC TTTAATAGAATGCGAATTAATAAATAATAAAATTTTATATGATAATGAAAATATGTTTCAATTAACAAAA AATTCTGAAAAAAAAATATTTTTATGTATAATTAAAAATTTAAAATTTAAAATAAATTCTTTAATAATTT GTATTGAAATCGCAAATTTTAATTTTAGTTTTTTTTTTTTTAATGATTTTTTATTTATAAAATATACATT TAAAAAACTATTAAAACTTTTAAAAATATTAATTGATAAAATAACTGTTATAAATTATTTAAAAAAGAAT TTCACAATAATGATATTAGGTAGAAGAAATGTAGGAAAGTCTACTTTATTTAATAAAATATGTGCACAAT ATGACTCGATTGTAACTAATATTCCTGGTACTACAAAAAATATTATATCAAAAAAAATAAAAATTTTATC TAAAAAAATAAAAATGATGGATACAGCAGGATTAAAAATTAGAACTAAAAATTTAATTGAAAAAATTGGA ATTATTAAAAATATAAATAAAATTTATCAAGGAAATTTAATTTTGTATATGATTGATAAATTTAATATTA AAAATATATTTTTTAACATTCCAATAGATTTTATTGATAAAATTAAATTAAATGAATTAATAATTTTAGT TAACAAATCAGATATTTTAGGAAAAGAAGAAGGAGTTTTTAAAATAAAAAATATATTAATAATTTTAATT TCTTCTAAAAATGGAACTTTTATAAAAAATTTAAAATGTTTTATTAATAAAATCGTTGATAATAAAGATT TTTCTAAAAATAATTATTCTGATGTTAAAATTCTATTTAATAAATTTTCTTTTTTTTATAAAGAATTTTC ATGTAACTATGATTTAGTGTTATCAAAATTAATTGATTTTCAAAAAAATATATTTAAATTAACAGGAAAT TTTACTAATAAAAAAATAATAAATTCTTGTTTTAGAAATTTTTGTATTGGTAAATGAATATTTTTAATAT AATTATTATTGGAGCAGGACATTCTGGTATAGAAGCAGCTATATCTGCATCTAAAATATGTAATAAAATA
A
, turtle moves leftT
, turtle moves rightG
, turtle moves upC
, turtle moves downlibrary(TurtleGraphics) turtle_init(mode = "clip") turtle_hide() for(i in 1:1000) { if(dna[i]=="A") { turtle_setangle(90) } else if(dna[i]=="T") { turtle_setangle(270) } else if(dna[i]=="G") { turtle_setangle(0) } else if(dna[i]=="C") { turtle_setangle(180) } turtle_forward(1) }
segments(x0, y0, x1, y1)
type="n"
ann=FALSE
x
and y
give horizontal and vertical rangepar(mar=...)
Read the help of segments
, par
and plot.default
par(mar=c(2, 2, 0.3, 0.3)) plot(x=c(-500,2500), y=c(-700,1100), type = "n", ann=FALSE) x <- y <- 0 for(i in 1:length(dna)) { if(dna[i]=="A") { segments(x, y, x+1, y) x <- x+1 } else if(dna[i]=="T") { segments(x, y, x-1, y) x <- x-1 } else if(dna[i]=="G") { segments(x, y, x, y+1) y <- y+1 } else if(dna[i]=="C") { segments(x, y, x, y-1) y <- y-1 } }
How can you answer these questions?
Despite being large molecules, DNA and proteins are made with pieces of only a few types
DNA is made of four bases. We can represent it with four letters
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
Proteins have 20 aminoacids and three stop codons. We can represent them with 23 letters.
MKRISTTITTTITITTGNGAG
In molecular biology we often work with sequences
The main reason why computing is useful for molecular biology
There are several ways to store DNA or protein data
Most of the times they are stored in FASTA format
FASTA files are text files, with some rules
You should never use word to store sequences
AP009180.1 Candidatus Carsonella ruddii PV DNA, complete genome ATGAATACTATATTTTCAAGAATAACACCATTAGGAAATGGTACGTTATGTGTTATAAGAATTTCTGGAA AAAATGTAAAATTTTTAATACAAAAAATTGTAAAAAAAAATATAAAAGAAAAAATAGCTACTTTTTCTAA ATTATTTTTAGATAAAGAATGTGTAGATTATGCAATGATTATTTTTTTTAAAAAACCAAATACGTTCACT GGAGAAGATATAATCGAATTTCATATTCACAATAATGAAACTATTGTAAAAAAAATAATTAATTATTTAT TATTAAATAAAGCAAGATTTGCAAAAGCTGGCGAATTTTTAGAAAGACGATATTTAAATGGAAAAATTTC TTTAATAGAATGCGAATTAATAAATAATAAAATTTTATATGATAATGAAAATATGTTTCAATTAACAAAA AATTCTGAAAAAAAAATATTTTTATGTATAATTAAAAATTTAAAATTTAAAATAAATTCTTTAATAATTT GTATTGAAATCGCAAATTTTAATTTTAGTTTTTTTTTTTTTAATGATTTTTTATTTATAAAATATACATT TAAAAAACTATTAAAACTTTTAAAAATATTAATTGATAAAATAACTGTTATAAATTATTTAAAAAAGAAT TTCACAATAATGATATTAGGTAGAAGAAATGTAGGAAAGTCTACTTTATTTAATAAAATATGTGCACAAT ATGACTCGATTGTAACTAATATTCCTGGTACTACAAAAAATATTATATCAAAAAAAATAAAAATTTTATC TAAAAAAATAAAAATGATGGATACAGCAGGATTAAAAATTAGAACTAAAAATTTAATTGAAAAAATTGGA ATTATTAAAAATATAAATAAAATTTATCAAGGAAATTTAATTTTGTATATGATTGATAAATTTAATATTA AAAATATATTTTTTAACATTCCAATAGATTTTATTGATAAAATTAAATTAAATGAATTAATAATTTTAGT TAACAAATCAGATATTTTAGGAAAAGAAGAAGGAGTTTTTAAAATAAAAAATATATTAATAATTTTAATT TCTTCTAAAAATGGAACTTTTATAAAAAATTTAAAATGTTTTATTAATAAAATCGTTGATAATAAAGATT TTTCTAAAAATAATTATTCTGATGTTAAAATTCTATTTAATAAATTTTCTTTTTTTTATAAAGAATTTTC ATGTAACTATGATTTAGTGTTATCAAAATTAATTGATTTTCAAAAAAATATATTTAAATTAACAGGAAAT TTTACTAATAAAAAAATAATAAATTCTTGTTTTAGAAATTTTTGTATTGGTAAATGAATATTTTTAATAT AATTATTATTGGAGCAGGACATTCTGGTATAGAAGCAGCTATATCTGCATCTAAAATATGTAATAAAATA
… and more
>NC_000913.3_prot_NP_414542.1_1 [gene=thrL] [protein=thr operon leader peptide] [protein_id=NP_414542.1] [location=190..255] MKRISTTITTTITITTGNGAG >NC_000913.3_prot_NP_414543.1_2 [gene=thrA] [protein=Bifunctional aspartokinase/homoserine dehydrogenase 1] [protein_id=NP_414543.1] [location=337..2799] MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERI FAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEA RGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYS AAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPC LIKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLIT QSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGISAKFFAAL ARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSW LKNKHIDLRVCGVANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAV ADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELM KFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIE IEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFK VKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLRTLSWKLGV >NC_000913.3_prot_NP_414544.1_3 [gene=thrB] [protein=homoserine kinase] [protein_id=NP_414544.1] [location=2801..3733] MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWE RFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHY DNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGF IHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVA DWLGKNYLQNQEGFVHICRLDTAGARVLEN >NC_000913.3_prot_NP_414545.1_4 [gene=thrC] [protein=L-threonine synthase] [protein_id=NP_414545.1] [location=3734..5020] MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDEMLKLDFVTRSAKILSAFIGDEIPQE ILEERVRAAFAFPAPVANVESDVGCLELFHGPTLAFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAA VAHAFYGLPNVKVVILYPRGKISPLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNS ANSINISRLLAQICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVP RFLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTMRELKELGYTS EPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKELAERADLPLLSHNLPADFAAL RKLMMNHQ >NC_000913.3_prot_NP_414546.1_5 [gene=yaaX] [protein=DUF2502 family putative periplasmic protein] [protein_id=NP_414546.1] [location=5234..5530] MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHL HGPPPPPRHHKKAPHDHHGGHGPGKHHR >NC_000913.3_prot_NP_414547.1_6 [gene=yaaA] [protein=peroxide resistance protein, lowers intracellular iron] [protein_id=NP_414547.1] [location=complement(5683..6459)] MLILISPAKTLDYQSPLTTTRYTLPELLDNSQQLIHEARKLTPPQISTLMRISDKLAGINAARFHDWQPD FTPANARQAILAFKGDVYTGLQAETFSEDDFDFAQQHLRMLSGLYGVLRPLDLMQPYRLEMGIRLENARG
To handle sequence data in R, we use the seqinr library
You have to install it once.
install.packages("seqinr")
Then you have to load it on every session
library(seqinr)