[1] "M" "C" "B" "Z" "I" "D" "G" "W" "H" "O"
[1] "M" "C" "B" "Z" "I" "D" "G" "W" "H" "O"
[1] "M" "C" "B" "Z" "I" "D" "G" "W" "H" "O"
R has four basic data types: numeric, logic, character, factor.
Only characters use quotes "
R data structures include vectors and data frames, among others
Vectors show [1]
to the left
G Z P I A V D Y C H
9 2 3 10 8 6 4 7 1 5
G Z P I A V D Y C H
9 2 3 10 8 6 4 7 1 5
b <- c(9, 2, 3, 10, 8, 6, 4, 7, 1, 5)
names(b) <- c("G", "Z", "P", "I", "A", "V", "D", "Y", "C", "H")
b
G Z P I A V D Y C H
9 2 3 10 8 6 4 7 1 5
When vectors are named, they do not show [1]
Instead, they show each element’s name over value
x[2:4]
?
"g" "e" "j"
g, j
[1] g e j k
"i, g, e, j"
["g","e","j","k"]
"g", "e"
g, e, j
g j
"i", "g","e","j"
"g", "j", "k"
"e", "j"
"g", "j"
[1] "g" "e" "j"
Everything between position 2 and 4
[1] "g" "e" "j"
a
after all these steps?7
10
a <- 2+5
Assignment copies values, and keep variables independent
Variables are not equal, they only have the same value
31
33
[1] 9 4 8 10
9, 4, 8, 10
(9, 4, 8, 10)
"9", "4", "8", "10"
4
v <- c(4,8,9,10)- 7.75
I dont know
How many elements are greater than 3
[1] 9 4 8 1 3 2 10
[1] TRUE TRUE TRUE FALSE FALSE FALSE TRUE
[1] 4
[[1]]
[1] "e" "y" "l" "s" "t" "o"
[[2]]
[1] "O" "R" "K" "C" "U" "W" "P" "T"
[[1]]
[1] "e" "y" "l" "s" "t" "o"
[[2]]
[1] "O" "R" "K" "C" "U" "W" "P" "T"
If you see double brackets [[]]
, then you see a list
Like vectors, but mixing different kinds of elements
people <- list(c(60, 72, 57, 90, 95, 72),
c(1.75, 1.80, 1.65, 1.90, 1.74, 1.91),
c("Ali", "Deniz", "Fatma", "Emre",
"Volkan", "Onur"),
TRUE,
factor(c("M","F","F","M","M","M")))
Notice that elements can have different length
[[1]]
[1] 60 72 57 90 95 72
[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91
[[3]]
[1] "Ali" "Deniz" "Fatma" "Emre" "Volkan" "Onur"
[[4]]
[1] TRUE
[[5]]
[1] M F F M M M
Levels: F M
Each list element starts with a number in double brackets
Inside each element, we can see vectors, lists or other things
When the element is a vector, we see a second number, in single brackets
[[1]]
[1] 60 72 57 90 95 72
[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91
[[1]]
[1] 60 72 57 90 95 72
[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91
This is a sublist (with one element):
[[1]]
[1] 60 72 57 90 95 72
This is an element:
[1] 60 72 57 90 95 72
people <- list(weight=c(60, 72, 57, 90, 95, 72),
height=c(1.75, 1.80, 1.65, 1.90, 1.74, 1.91),
names=c("Ali", "Deniz", "Fatma", "Emre",
"Volkan", "Onur"),
valid=TRUE,
gender=factor(c("M","F","F","M","M","M")))
How else can we assign names?
$weight
[1] 60 72 57 90 95 72
$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91
$names
[1] "Ali" "Deniz" "Fatma" "Emre" "Volkan" "Onur"
$valid
[1] TRUE
$gender
[1] M F F M M M
Levels: F M
$weight
[1] 60 72 57 90 95 72
$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91
This is a sublist:
$weight
[1] 60 72 57 90 95 72
This is an element:
[1] 60 72 57 90 95 72
[1] 60 72 57 90 95 72
[1] 60 72 57 90 95 72
[1] 60 72 57 90 95 72
Indices can also be used to change specific parts of a list.
For example we can update the names
[1] "ALI" "DENIZ" "FATMA" "EMRE" "VOLKAN" "ONUR"
$weight
[1] 60 72 57 90 95 72
$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91
$names
[1] "ALI" "DENIZ" "FATMA" "EMRE" "VOLKAN" "ONUR"
$gender
[1] M F F M M M
Levels: F M
$weight
[1] 60 72 57 90 95 72
$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91
$names
[1] "ALI" "DENIZ" "FATMA" "EMRE" "VOLKAN" "ONUR"
$gender
[1] M F F M M M
Levels: F M
$BMI
[1] 19.59184 22.22222 20.93664 24.93075 31.37799 19.73630
[[]]
[]
Try these
[1] 1.75 1.80 1.65 1.90 1.74 1.91
$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91
[1] 1.65
$<NA>
NULL
Error in people[[1:3]]: recursive indexing failed at level 2
$weight
[1] 60 72 57 90 95 72
$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91
$names
[1] "ALI" "DENIZ" "FATMA" "EMRE" "VOLKAN" "ONUR"
[1] 60 72 57 90 95 72
[1] 60 72 57 90 95 72
$weight
[1] 60 72 57 90 95 72
library(seqinr)
proteins <- read.fasta("AP009180.faa", seqtype="AA", set.attributes = FALSE)
proteins[1:10]
$`lcl|AP009180.1_prot_BAF35032.1_1`
[1] "M" "N" "T" "I" "F" "S" "R" "I" "T" "P" "L" "G" "N" "G" "T" "L"
[17] "C" "V" "I" "R" "I" "S" "G" "K" "N" "V" "K" "F" "L" "I" "Q" "K"
[33] "I" "V" "K" "K" "N" "I" "K" "E" "K" "I" "A" "T" "F" "S" "K" "L"
[49] "F" "L" "D" "K" "E" "C" "V" "D" "Y" "A" "M" "I" "I" "F" "F" "K"
[65] "K" "P" "N" "T" "F" "T" "G" "E" "D" "I" "I" "E" "F" "H" "I" "H"
[81] "N" "N" "E" "T" "I" "V" "K" "K" "I" "I" "N" "Y" "L" "L" "L" "N"
[97] "K" "A" "R" "F" "A" "K" "A" "G" "E" "F" "L" "E" "R" "R" "Y" "L"
[113] "N" "G" "K" "I" "S" "L" "I" "E" "C" "E" "L" "I" "N" "N" "K" "I"
[129] "L" "Y" "D" "N" "E" "N" "M" "F" "Q" "L" "T" "K" "N" "S" "E" "K"
[145] "K" "I" "F" "L" "C" "I" "I" "K" "N" "L" "K" "F" "K" "I" "N" "S"
[161] "L" "I" "I" "C" "I" "E" "I" "A" "N" "F" "N" "F" "S" "F" "F" "F"
[177] "F" "N" "D" "F" "L" "F" "I" "K" "Y" "T" "F" "K" "K" "L" "L" "K"
[193] "L" "L" "K" "I" "L" "I" "D" "K" "I" "T" "V" "I" "N" "Y" "L" "K"
[209] "K" "N" "F" "T" "I" "M" "I" "L" "G" "R" "R" "N" "V" "G" "K" "S"
[225] "T" "L" "F" "N" "K" "I" "C" "A" "Q" "Y" "D" "S" "I" "V" "T" "N"
[241] "I" "P" "G" "T" "T" "K" "N" "I" "I" "S" "K" "K" "I" "K" "I" "L"
[257] "S" "K" "K" "I" "K" "M" "M" "D" "T" "A" "G" "L" "K" "I" "R" "T"
[273] "K" "N" "L" "I" "E" "K" "I" "G" "I" "I" "K" "N" "I" "N" "K" "I"
[289] "Y" "Q" "G" "N" "L" "I" "L" "Y" "M" "I" "D" "K" "F" "N" "I" "K"
[305] "N" "I" "F" "F" "N" "I" "P" "I" "D" "F" "I" "D" "K" "I" "K" "L"
[321] "N" "E" "L" "I" "I" "L" "V" "N" "K" "S" "D" "I" "L" "G" "K" "E"
[337] "E" "G" "V" "F" "K" "I" "K" "N" "I" "L" "I" "I" "L" "I" "S" "S"
[353] "K" "N" "G" "T" "F" "I" "K" "N" "L" "K" "C" "F" "I" "N" "K" "I"
[369] "V" "D" "N" "K" "D" "F" "S" "K" "N" "N" "Y" "S" "D" "V" "K" "I"
[385] "L" "F" "N" "K" "F" "S" "F" "F" "Y" "K" "E" "F" "S" "C" "N" "Y"
[401] "D" "L" "V" "L" "S" "K" "L" "I" "D" "F" "Q" "K" "N" "I" "F" "K"
[417] "L" "T" "G" "N" "F" "T" "N" "K" "K" "I" "I" "N" "S" "C" "F" "R"
[433] "N" "F" "C" "I" "G" "K"
$`lcl|AP009180.1_prot_BAF35033.1_2`
[1] "M" "N" "I" "F" "N" "I" "I" "I" "I" "G" "A" "G" "H" "S" "G" "I"
[17] "E" "A" "A" "I" "S" "A" "S" "K" "I" "C" "N" "K" "I" "K" "I" "I"
[33] "T" "S" "N" "L" "E" "N" "L" "G" "I" "M" "S" "C" "N" "P" "S" "I"
[49] "G" "G" "I" "G" "K" "S" "H" "L" "V" "K" "E" "L" "E" "L" "F" "G"
[65] "G" "I" "M" "P" "E" "A" "S" "D" "Y" "S" "R" "I" "H" "S" "K" "L"
[81] "L" "N" "Y" "K" "K" "G" "E" "S" "V" "H" "S" "L" "R" "Y" "Q" "I"
[97] "D" "R" "I" "L" "Y" "K" "N" "Y" "I" "L" "K" "I" "L" "F" "L" "K"
[113] "K" "N" "I" "L" "I" "E" "Q" "N" "E" "I" "N" "K" "I" "I" "R" "F"
[129] "K" "K" "K" "I" "L" "I" "F" "N" "K" "L" "K" "F" "F" "N" "I" "A"
[145] "K" "I" "I" "I" "V" "C" "A" "G" "T" "F" "I" "N" "S" "K" "I" "Y"
[161] "I" "G" "K" "N" "I" "K" "A" "L" "N" "K" "A" "E" "K" "K" "S" "I"
[177] "S" "Y" "S" "F" "K" "K" "I" "N" "L" "F" "I" "S" "K" "L" "K" "T"
[193] "G" "T" "P" "P" "R" "L" "D" "L" "N" "Y" "L" "N" "Y" "K" "K" "L"
[209] "S" "V" "Q" "Y" "S" "D" "Y" "T" "I" "S" "Y" "G" "K" "N" "F" "N"
[225] "F" "N" "N" "N" "V" "K" "C" "F" "I" "T" "N" "T" "D" "N" "K" "I"
[241] "N" "N" "F" "I" "K" "K" "N" "I" "K" "N" "S" "S" "L" "F" "N" "L"
[257] "K" "F" "K" "S" "I" "G" "P" "R" "Y" "C" "P" "S" "I" "E" "D" "K"
[273] "I" "F" "K" "F" "P" "N" "N" "K" "N" "H" "Q" "I" "F" "L" "E" "P"
[289] "E" "S" "Y" "F" "S" "K" "E" "I" "Y" "V" "N" "G" "L" "S" "N" "S"
[305] "L" "S" "Y" "N" "I" "Q" "K" "K" "L" "I" "K" "K" "I" "L" "G" "I"
[321] "K" "K" "S" "Y" "I" "I" "R" "Y" "A" "Y" "N" "I" "Q" "Y" "D" "Y"
[337] "F" "D" "P" "R" "C" "L" "K" "I" "S" "L" "N" "I" "K" "F" "A" "N"
[353] "N" "I" "F" "L" "A" "G" "Q" "I" "N" "G" "T" "T" "G" "Y" "E" "E"
[369] "A" "S" "S" "Q" "G" "F" "V" "A" "G" "I" "N" "S" "A" "R" "K" "I"
[385] "L" "K" "L" "P" "L" "W" "K" "P" "K" "K" "W" "N" "S" "Y" "I" "G"
[401] "V" "L" "L" "Y" "D" "L" "T" "N" "F" "G" "I" "Q" "E" "P" "Y" "R"
[417] "I" "F" "T" "S" "K" "S" "D" "N" "R" "L" "F" "L" "R" "F" "D" "N"
[433] "A" "I" "F" "R" "L" "I" "N" "I" "S" "Y" "Y" "L" "G" "C" "L" "P"
[449] "I" "V" "K" "F" "K" "Y" "Y" "N" "S" "L" "I" "Y" "K" "F" "Y" "K"
[465] "N" "L" "I" "N" "I" "R" "K" "I" "K" "L" "F" "D" "N" "F" "Y" "L"
[481] "F" "K" "L" "I" "I" "I" "M" "S" "K" "Y" "Y" "G" "Y" "I" "K" "K"
[497] "K" "Y" "F" "K"
$`lcl|AP009180.1_prot_BAF35034.1_3`
[1] "M" "V" "I" "L" "K" "K" "N" "I" "L" "N" "N" "F" "L" "N" "F" "K"
[17] "I" "I" "D" "L" "N" "L" "I" "I" "L" "L" "L" "F" "I" "H" "L" "I"
[33] "V" "F" "Y" "L" "L" "K" "N" "N" "N" "L" "M" "I" "L" "L" "S" "I"
[49] "Y" "L" "N" "N" "F" "I" "K" "N" "S" "I" "N" "L" "N" "S" "R" "N"
[65] "I" "I" "F" "F" "F" "S" "L" "V" "L" "F" "N" "I" "I" "L" "F" "S"
[81] "N" "F" "I" "D" "L" "F" "P" "N" "N" "L" "I" "K" "N" "F" "L" "N"
[97] "L" "K" "Q" "I" "E" "I" "V" "P" "T" "S" "N" "I" "N" "I" "T" "F"
[113] "C" "F" "S" "I" "I" "S" "F" "L" "I" "I" "I" "M" "L" "T" "H" "K"
[129] "K" "I" "G" "F" "K" "K" "Y" "I" "Y" "S" "F" "F" "I" "Y" "P" "I"
[145] "N" "T" "E" "Y" "L" "Y" "L" "F" "N" "F" "I" "I" "E" "S" "I" "S"
[161] "Y" "I" "M" "K" "P" "I" "S" "L" "S" "L" "R" "L" "F" "G" "N" "I"
[177] "F" "S" "S" "E" "I" "I" "F" "N" "I" "I" "N" "N" "M" "N" "V" "F"
[193] "I" "N" "S" "F" "L" "N" "L" "I" "W" "G" "I" "F" "H" "F" "I" "I"
[209] "L" "P" "L" "Q" "S" "F" "I" "F" "I" "T" "L" "V" "I" "I" "Y" "V"
[225] "S" "Q" "T" "L" "N" "H"
$`lcl|AP009180.1_prot_BAF35035.1_4`
[1] "M" "N" "N" "L" "L" "I" "L" "S" "S" "S" "I" "M" "I" "G" "L" "S"
[17] "S" "I" "G" "T" "G" "I" "G" "F" "G" "I" "L" "G" "G" "K" "L" "L"
[33] "D" "S" "I" "S" "R" "Q" "P" "E" "L" "D" "N" "L" "L" "L" "T" "R"
[49] "T" "F" "L" "M" "T" "G" "L" "L" "D" "A" "I" "P" "M" "I" "S" "V"
[65] "G" "I" "G" "L" "Y" "L" "I" "F" "V" "L" "S" "N" "K"
$`lcl|AP009180.1_prot_BAF35036.1_5`
[1] "M" "N" "F" "N" "Y" "T" "I" "I" "N" "E" "F" "V" "S" "F" "L" "I"
[17] "F" "F" "Y" "V" "S" "F" "K" "I" "I" "F" "P" "V" "I" "L" "K" "K"
[33] "I" "N" "N" "F" "L" "I" "I" "D" "Y" "K" "N" "F" "V" "F" "N" "N"
[49] "Q" "E" "K" "I" "I" "K" "K" "K" "L" "L" "D" "E" "I" "V" "K" "N"
[65] "E" "N" "L" "T" "N" "K" "K" "F" "I" "S" "L" "I" "E" "K" "I" "K"
[81] "K" "S" "I" "L" "L" "E" "K" "Q" "N" "F" "I" "N" "F" "I" "K" "L"
[97] "E" "K" "I" "N" "V" "L" "K" "I" "F" "K" "K" "K" "I" "L" "N" "N"
[113] "N" "M" "L" "I" "I" "K" "N" "F" "L" "I" "E" "I" "K" "K" "L" "F"
[129] "I" "N" "S" "F" "K" "N" "I" "F" "N" "E" "I" "I" "C" "Y" "N" "N"
[145] "E" "F" "I" "I" "N" "Y" "V"
$`lcl|AP009180.1_prot_BAF35037.1_6`
[1] "M" "F" "K" "F" "I" "N" "R" "F" "L" "N" "L" "K" "K" "R" "Y" "F"
[17] "Y" "I" "F" "L" "I" "N" "F" "F" "Y" "F" "F" "N" "K" "C" "N" "F"
[33] "I" "K" "K" "K" "K" "I" "Y" "K" "K" "I" "I" "T" "K" "K" "F" "E"
[49] "N" "Y" "L" "L" "K" "L" "I" "I" "Q" "K" "Y" "A" "K"
$`lcl|AP009180.1_prot_BAF35038.1_7`
[1] "M" "L" "N" "E" "G" "I" "I" "N" "K" "I" "Y" "D" "S" "V" "V" "E"
[17] "V" "L" "G" "L" "K" "N" "A" "K" "Y" "G" "E" "M" "I" "L" "F" "S"
[33] "K" "N" "I" "K" "G" "I" "V" "F" "S" "L" "N" "K" "K" "N" "V" "N"
[49] "I" "I" "I" "L" "N" "N" "Y" "N" "E" "L" "T" "Q" "G" "E" "K" "C"
[65] "Y" "C" "T" "N" "K" "I" "F" "E" "V" "P" "V" "G" "K" "Q" "L" "I"
[81] "G" "R" "I" "I" "N" "S" "R" "G" "E" "T" "L" "D" "L" "L" "P" "E"
[97] "I" "K" "I" "N" "E" "F" "S" "P" "I" "E" "K" "I" "A" "P" "G" "V"
[113] "M" "D" "R" "E" "T" "V" "N" "E" "P" "L" "L" "T" "G" "I" "K" "S"
[129] "I" "D" "S" "M" "I" "P" "I" "G" "K" "G" "Q" "R" "E" "L" "I" "I"
[145] "G" "D" "R" "Q" "T" "G" "K" "T" "T" "I" "C" "I" "D" "T" "I" "I"
[161] "N" "Q" "K" "N" "K" "N" "I" "I" "C" "V" "Y" "V" "C" "I" "G" "Q"
[177] "K" "I" "S" "S" "L" "I" "N" "I" "I" "N" "K" "L" "K" "K" "F" "N"
[193] "C" "L" "E" "Y" "T" "I" "I" "V" "A" "S" "T" "A" "S" "D" "S" "A"
[209] "A" "E" "Q" "Y" "I" "A" "P" "Y" "T" "G" "S" "T" "I" "S" "E" "Y"
[225] "F" "R" "D" "K" "G" "Q" "D" "C" "L" "I" "V" "Y" "D" "D" "L" "T"
[241] "K" "H" "A" "W" "A" "Y" "R" "Q" "I" "S" "L" "L" "L" "R" "R" "P"
[257] "P" "G" "R" "E" "A" "Y" "P" "G" "D" "V" "F" "Y" "L" "H" "S" "R"
[273] "L" "L" "E" "R" "S" "S" "K" "V" "N" "K" "F" "F" "V" "N" "K" "K"
[289] "S" "N" "I" "L" "K" "A" "G" "S" "L" "T" "A" "F" "P" "I" "I" "E"
[305] "T" "L" "E" "G" "D" "V" "T" "S" "F" "I" "P" "T" "N" "V" "I" "S"
[321] "I" "T" "D" "G" "Q" "I" "F" "L" "D" "T" "N" "L" "F" "N" "S" "G"
[337] "I" "R" "P" "S" "I" "N" "V" "G" "L" "S" "V" "S" "R" "V" "G" "G"
[353] "A" "A" "Q" "Y" "K" "I" "I" "K" "K" "L" "S" "G" "D" "I" "R" "I"
[369] "M" "L" "A" "Q" "Y" "R" "E" "L" "E" "A" "F" "S" "K" "F" "S" "S"
[385] "D" "L" "D" "S" "E" "T" "K" "N" "Q" "L" "I" "I" "G" "E" "K" "I"
[401] "T" "I" "L" "M" "K" "Q" "N" "I" "H" "D" "V" "Y" "D" "I" "F" "E"
[417] "L" "I" "L" "I" "L" "L" "I" "I" "K" "H" "D" "F" "F" "R" "L" "I"
[433] "P" "I" "N" "Q" "V" "E" "Y" "F" "E" "N" "K" "I" "I" "N" "Y" "L"
[449] "R" "K" "I" "K" "F" "K" "N" "Q" "I" "E" "I" "D" "N" "K" "N" "L"
[465] "E" "N" "C" "L" "N" "E" "L" "I" "S" "F" "F" "I" "S" "N" "S" "I"
[481] "L"
$`lcl|AP009180.1_prot_BAF35039.1_8`
[1] "M" "I" "I" "K" "E" "I" "N" "S" "K" "I" "K" "I" "T" "T" "N" "I"
[17] "N" "K" "L" "T" "N" "T" "L" "S" "M" "I" "S" "L" "S" "K" "M" "N"
[33] "K" "Y" "I" "N" "L" "I" "N" "N" "L" "D" "Y" "I" "N" "I" "E" "L"
[49] "K" "K" "I" "L" "E" "Y" "I" "I" "I" "N" "I" "K" "S" "N" "V" "F"
[65] "C" "L" "I" "I" "I" "T" "S" "N" "K" "G" "L" "C" "G" "N" "L" "N"
[81] "N" "E" "I" "I" "K" "Y" "S" "L" "N" "Y" "I" "K" "N" "N" "K" "N"
[97] "L" "D" "L" "I" "L" "I" "G" "K" "K" "G" "I" "D" "F" "F" "N" "K"
[113] "K" "N" "F" "Y" "I" "K" "E" "K" "I" "I" "F" "K" "D" "N" "E" "L"
[129] "K" "N" "L" "V" "F" "N" "N" "K" "I" "L" "N" "D" "L" "K" "K" "Y"
[145] "E" "N" "I" "F" "F" "I" "S" "S" "K" "I" "I" "K" "N" "N" "V" "K"
[161] "I" "I" "K" "T" "D" "L" "Y" "L" "K" "K" "K" "Y" "N" "Y" "L" "I"
[177] "K" "H" "N" "F" "N" "Y" "D" "C" "F" "L" "K" "N" "F" "Y" "N" "Y"
[193] "N" "L" "K" "C" "L" "Y" "L" "N" "N" "L" "F" "C" "E" "L" "K" "S"
[209] "R" "M" "I" "T" "M" "K" "S" "A" "A" "D" "N" "S" "K" "K" "I" "I"
[225] "K" "D" "M" "K" "L" "I" "K" "N" "K" "I" "R" "Q" "F" "K" "V" "T"
[241] "Q" "D" "M" "L" "E" "I" "I" "N" "G" "S" "N" "L"
$`lcl|AP009180.1_prot_BAF35040.1_9`
[1] "M" "I" "G" "R" "I" "V" "Q" "I" "L" "G" "S" "I" "V" "D" "V" "E"
[17] "F" "K" "K" "N" "N" "I" "P" "Y" "I" "Y" "N" "A" "L" "F" "I" "K"
[33] "E" "F" "N" "L" "Y" "L" "E" "V" "Q" "Q" "Q" "I" "G" "N" "N" "I"
[49] "V" "R" "T" "I" "A" "L" "G" "S" "T" "Y" "G" "L" "K" "R" "Y" "L"
[65] "L" "V" "I" "D" "T" "K" "K" "P" "I" "L" "T" "P" "V" "G" "N" "C"
[81] "T" "L" "G" "R" "I" "L" "N" "V" "L" "G" "N" "P" "I" "D" "N" "N"
[97] "G" "E" "I" "I" "S" "N" "K" "K" "K" "P" "I" "H" "C" "S" "P" "P"
[113] "K" "F" "S" "D" "Q" "V" "F" "S" "N" "N" "I" "L" "E" "T" "G" "I"
[129] "K" "V" "I" "D" "L" "L" "C" "P" "F" "L" "R" "G" "G" "K" "I" "G"
[145] "L" "F" "G" "G" "A" "G" "V" "G" "K" "T" "I" "N" "M" "M" "E" "L"
[161] "I" "R" "N" "I" "A" "I" "E" "H" "K" "G" "C" "S" "V" "F" "I" "G"
[177] "V" "G" "E" "R" "T" "R" "E" "G" "N" "D" "F" "Y" "Y" "E" "M" "K"
[193] "E" "S" "N" "V" "L" "D" "K" "V" "S" "L" "I" "Y" "G" "Q" "M" "N"
[209] "E" "P" "S" "G" "N" "R" "L" "R" "V" "A" "L" "T" "G" "L" "S" "I"
[225] "A" "E" "E" "F" "R" "E" "M" "G" "K" "D" "V" "L" "L" "F" "I" "D"
[241] "N" "I" "Y" "R" "F" "T" "L" "A" "G" "T" "E" "I" "S" "A" "L" "L"
[257] "G" "R" "M" "P" "S" "A" "V" "G" "Y" "Q" "P" "T" "L" "A" "E" "E"
[273] "M" "G" "K" "L" "Q" "E" "R" "I" "S" "S" "T" "K" "N" "G" "S" "I"
[289] "T" "S" "V" "Q" "A" "I" "Y" "V" "P" "A" "D" "D" "L" "T" "D" "P"
[305] "S" "P" "S" "T" "T" "F" "T" "H" "L" "D" "S" "T" "I" "V" "L" "S"
[321] "R" "Q" "I" "A" "E" "L" "G" "I" "Y" "P" "A" "I" "D" "P" "L" "E"
[337] "S" "Y" "S" "K" "Q" "L" "D" "P" "Y" "I" "V" "G" "I" "E" "H" "Y"
[353] "E" "I" "A" "N" "S" "V" "K" "F" "Y" "L" "Q" "K" "Y" "K" "E" "L"
[369] "K" "D" "T" "I" "A" "I" "L" "G" "M" "D" "E" "L" "S" "E" "N" "D"
[385] "Q" "I" "I" "V" "K" "R" "A" "R" "K" "L" "Q" "R" "F" "F" "S" "Q"
[401] "P" "F" "F" "V" "G" "E" "I" "F" "T" "G" "I" "K" "G" "E" "Y" "V"
[417] "N" "I" "K" "D" "T" "I" "Q" "C" "F" "K" "N" "I" "L" "N" "G" "E"
[433] "F" "D" "N" "I" "N" "E" "K" "N" "F" "Y" "M" "I" "G" "K" "I"
$`lcl|AP009180.1_prot_BAF35041.1_10`
[1] "M" "N" "L" "L" "I" "L" "S" "I" "K" "N" "I" "I" "E" "Y" "K" "N"
[17] "A" "S" "I" "L" "N" "V" "K" "T" "Y" "L" "K" "L" "F" "S" "I" "M"
[33] "N" "N" "H" "I" "N" "N" "I" "C" "D" "V" "N" "Q" "I" "K" "L" "I"
[49] "F" "K" "N" "K" "I" "I" "N" "I" "R" "I" "N" "N" "G" "F" "L" "F"
[65] "Q" "K" "K" "N" "N" "T" "K" "I" "I" "C" "N" "F" "Y" "E" "F" "L"
read.fasta()
A list of vectors of chars. Each element is a sequence object. The first sequence is
[1] "M" "N" "T" "I" "F" "S" "R" "I" "T" "P" "L" "G" "N" "G" "T" "L"
[17] "C" "V" "I" "R" "I" "S" "G" "K" "N" "V" "K" "F" "L" "I" "Q" "K"
[33] "I" "V" "K" "K" "N" "I" "K" "E" "K" "I" "A" "T" "F" "S" "K" "L"
[49] "F" "L" "D" "K" "E" "C" "V" "D" "Y" "A" "M" "I" "I" "F" "F" "K"
[65] "K" "P" "N" "T" "F" "T" "G" "E" "D" "I" "I" "E" "F" "H" "I" "H"
[81] "N" "N" "E" "T" "I" "V" "K" "K" "I" "I" "N" "Y" "L" "L" "L" "N"
[97] "K" "A" "R" "F" "A" "K" "A" "G" "E" "F" "L" "E" "R" "R" "Y" "L"
[113] "N" "G" "K" "I" "S" "L" "I" "E" "C" "E" "L" "I" "N" "N" "K" "I"
[129] "L" "Y" "D" "N" "E" "N" "M" "F" "Q" "L" "T" "K" "N" "S" "E" "K"
[145] "K" "I" "F" "L" "C" "I" "I" "K" "N" "L" "K" "F" "K" "I" "N" "S"
[161] "L" "I" "I" "C" "I" "E" "I" "A" "N" "F" "N" "F" "S" "F" "F" "F"
[177] "F" "N" "D" "F" "L" "F" "I" "K" "Y" "T" "F" "K" "K" "L" "L" "K"
[193] "L" "L" "K" "I" "L" "I" "D" "K" "I" "T" "V" "I" "N" "Y" "L" "K"
[209] "K" "N" "F" "T" "I" "M" "I" "L" "G" "R" "R" "N" "V" "G" "K" "S"
[225] "T" "L" "F" "N" "K" "I" "C" "A" "Q" "Y" "D" "S" "I" "V" "T" "N"
[241] "I" "P" "G" "T" "T" "K" "N" "I" "I" "S" "K" "K" "I" "K" "I" "L"
[257] "S" "K" "K" "I" "K" "M" "M" "D" "T" "A" "G" "L" "K" "I" "R" "T"
[273] "K" "N" "L" "I" "E" "K" "I" "G" "I" "I" "K" "N" "I" "N" "K" "I"
[289] "Y" "Q" "G" "N" "L" "I" "L" "Y" "M" "I" "D" "K" "F" "N" "I" "K"
[305] "N" "I" "F" "F" "N" "I" "P" "I" "D" "F" "I" "D" "K" "I" "K" "L"
[321] "N" "E" "L" "I" "I" "L" "V" "N" "K" "S" "D" "I" "L" "G" "K" "E"
[337] "E" "G" "V" "F" "K" "I" "K" "N" "I" "L" "I" "I" "L" "I" "S" "S"
[353] "K" "N" "G" "T" "F" "I" "K" "N" "L" "K" "C" "F" "I" "N" "K" "I"
[369] "V" "D" "N" "K" "D" "F" "S" "K" "N" "N" "Y" "S" "D" "V" "K" "I"
[385] "L" "F" "N" "K" "F" "S" "F" "F" "Y" "K" "E" "F" "S" "C" "N" "Y"
[401] "D" "L" "V" "L" "S" "K" "L" "I" "D" "F" "Q" "K" "N" "I" "F" "K"
[417] "L" "T" "G" "N" "F" "T" "N" "K" "K" "I" "I" "N" "S" "C" "F" "R"
[433] "N" "F" "C" "I" "G" "K"
If the DNA has been sequenced then the GC-content can be accurately calculated by simple arithmetic.
GC-content percentage is calculated as \[\frac{G+C}{A+T+G+C}\]
We want to find the GC content of the first gene of E.coli
table()
functionhow do we find it?
We download coding sequencs in FASTA format from NCBI
which one?
The one with accesssion number NC_000913
What is the “first gene”? near the replication origin?
We take the first element of the list
Which strand?
the strand in the FASTA file
How do we find a gene?
We use genes defined in the FASTA file
let’s assume that the gene is in a vector V
use the table()
function
Then calculate using the GC formula
V <- genes[[1]]
count <- table(V)
(count["g"]+count["c"])/(count["g"]+count["c"]+count["a"]+count["t"])
g
0.5151515
This was the same solution I used in previous years
But I found that there is a problem
What if the gene is TGTGTGTGTG
?
G T
4 4
<NA>
NA
sum(logic)
instead of table()
TGTGTGTGTG
?Now it works correctly
[1] 0
Ther are 0 nucleiotides “C”
But we have another problem. Sometimes DNA is lowercase
[1] 0
This code gives us a wrong answer. There are 2 nucleotides “C”
Remember than in the computer upper- and lower-case letters are different
We use the function toupper()
. It takes a string and transforms it into upper case
[1] 2
This code implements the idea we developed on the last class
V <- toupper(genes[[1]])
count_C <- sum(V=="C")
count_G <- sum(V=="G")
GC_content <- (count_C +count_G)/length(V)
print(GC_content)
[1] 0.5151515
We will use it on the next class
read.fasta()
gives named liststoupper()
to get uppercase letterssum(V=="X")
to count the “X”s in V
==
length(V)
for the sequence length