October 25, 2018
Find the name of the person with the median age
x$age
[1] 20 23 26 29 32 35 38 41 44
Step 1: find the median age
median(x$age)
[1] 32
Step 2: find which ages are equal to the median age
x$age==median(x$age)
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE [9] FALSE
Step 3: find who has the age equal to the mean age
x$name[x$age==median(x$age)]
[1] "Elif"
Some people tried x$name[median(x$age)]
, which is the same as
x$name[32]
That is, the element of x$name
at position 32
The value of the median is not the same as the position of the median
min()
What if we look for the name of the youngest person?
The minimal age is
min(x$age)
[1] 20
Which is the element of minimal age?
which.min(x$age)
[1] 1
max()
The minimal age is
max(x$age)
[1] 44
Which is the element of minimal age?
which.max(x$age)
[1] 9
This difference is important. We will use it later
We have our own data
survey <- read.table("survey1-tidy.txt") weight <- survey$weight weight
[1] 67.0 58.0 56.0 94.0 60.0 77.0 56.0 75.0 [9] 80.0 105.0 59.0 70.0 57.0 50.0 78.0 55.0 [17] 106.0 68.0 68.0 65.0 76.0 42.5 55.0 69.0 [25] 60.0 58.0 52.0 47.0 65.0 67.0 68.0 74.0 [33] 55.0 55.0 60.0 50.0 55.0 58.0 75.0 53.0 [41] 81.0 54.0 55.0 72.0 65.0 64.0 54.0 85.0 [49] 63.0 75.0 77.0
weight
?sort(weight)
[1] 42.5 47.0 50.0 50.0 52.0 53.0 54.0 54.0 [9] 55.0 55.0 55.0 55.0 55.0 55.0 56.0 56.0 [17] 57.0 58.0 58.0 58.0 59.0 60.0 60.0 60.0 [25] 63.0 64.0 65.0 65.0 65.0 67.0 67.0 68.0 [33] 68.0 68.0 69.0 70.0 72.0 74.0 75.0 75.0 [41] 75.0 76.0 77.0 77.0 78.0 80.0 81.0 85.0 [49] 94.0 105.0 106.0
sort(weight, decreasing=TRUE)
[1] 106.0 105.0 94.0 85.0 81.0 80.0 78.0 77.0 [9] 77.0 76.0 75.0 75.0 75.0 74.0 72.0 70.0 [17] 69.0 68.0 68.0 68.0 67.0 67.0 65.0 65.0 [25] 65.0 64.0 63.0 60.0 60.0 60.0 59.0 58.0 [33] 58.0 58.0 57.0 56.0 56.0 55.0 55.0 55.0 [41] 55.0 55.0 55.0 54.0 54.0 53.0 52.0 50.0 [49] 50.0 47.0 42.5
The command sort()
works only for vectors
To sort a data frame, we first need to choose which column we use to order
We know the position of the smallest and the largest
which.min(weight)
[1] 22
which.max(weight)
[1] 17
We need the positions in between
For that we use the order()
command
weight
[1] 67.0 58.0 56.0 94.0 60.0 77.0 56.0 75.0 [9] 80.0 105.0 59.0 70.0 57.0 50.0 78.0 55.0 [17] 106.0 68.0 68.0 65.0 76.0 42.5 55.0 69.0 [25] 60.0 58.0 52.0 47.0 65.0 67.0 68.0 74.0 [33] 55.0 55.0 60.0 50.0 55.0 58.0 75.0 53.0 [41] 81.0 54.0 55.0 72.0 65.0 64.0 54.0 85.0 [49] 63.0 75.0 77.0
order()
to sort a data frameorder(weight)
[1] 22 28 14 36 27 40 42 47 16 23 33 34 37 43 3 7 13 [18] 2 26 38 11 5 25 35 49 46 20 29 45 1 30 18 19 31 [35] 24 12 44 32 8 39 50 21 6 51 15 9 41 48 4 10 17
This gives us the position of the smallest, the second smallest, and so on up to the largest
survey[order(weight),]
Gender birth_day birth_month birth_year height_cm weight_kg handness st22 Female 13 10 1997 155 42.5 Right st28 Female 7 7 1997 166 47.0 Right st14 Female 3 7 1997 160 50.0 Right st36 Female 24 3 1998 167 50.0 Right st27 Female 13 10 1997 171 52.0 Right st40 Female 5 2 1998 157 53.0 Right st42 Female 18 5 1997 165 54.0 Right st47 Female 29 7 1997 160 54.0 Right st16 Female 3 9 2018 164 55.0 Right st23 Female 2 10 1998 172 55.0 Right st33 Female 21 5 1998 168 55.0 Right st34 Female 3 9 1998 174 55.0 Right st37 Female 17 9 1998 173 55.0 Right st43 Female 23 5 1999 178 55.0 Right st3 Female 28 1 1995 170 56.0 Left st7 Female 5 4 1996 173 56.0 Right st13 Female 9 6 1998 158 57.0 Right st2 Female 9 10 1995 167 58.0 Right st26 Female 17 5 1998 165 58.0 Right st38 Female 2 1 1999 162 58.0 Right st11 Male 26 12 1997 176 59.0 Right st5 Female 1 1 1991 160 60.0 Right st25 Female 17 8 1998 163 60.0 Right st35 Female 1 9 1998 174 60.0 Right st49 Female 2 5 1999 165 63.0 Left st46 Male 6 11 1998 163 64.0 Right st20 Female 30 6 1997 158 65.0 Right st29 Male 28 7 1998 185 65.0 Left st45 Male 6 12 1997 166 65.0 Right st1 Male 1 2 1993 179 67.0 Right st30 Male 5 1 1997 178 67.0 Right st18 Female 16 11 1998 163 68.0 Right st19 Female 3 5 1998 162 68.0 Right st31 Male 27 11 1997 180 68.0 Right st24 Female 10 6 1998 159 69.0 Right st12 Male 9 2 1997 183 70.0 Right st44 Female 19 9 1997 174 72.0 Right st32 Male 29 8 1998 170 74.0 Right st8 Female 14 1 1997 162 75.0 Left st39 Male 19 11 1998 175 75.0 Right st50 Male 31 10 1998 184 75.0 Right st21 Male 15 1 2018 175 76.0 Right st6 Male 26 9 1996 175 77.0 Right st51 Male 9 3 1996 177 77.0 Right st15 Male 13 10 1998 182 78.0 Right st9 Male 1 5 1997 173 80.0 Right st41 Male 18 5 1997 181 81.0 Right st48 Male 14 3 1993 195 85.0 Right st4 Male 11 8 1992 180 94.0 Right st10 Male 25 6 1997 188 105.0 Right st17 Male 10 1 1998 175 106.0 Right hand_span_cm st22 20 st28 20 st14 15 st36 30 st27 25 st40 20 st42 18 st47 20 st16 20 st23 20 st33 14 st34 22 st37 8 st43 12 st3 18 st7 21 st13 19 st2 18 st26 19 st38 19 st11 24 st5 19 st25 15 st35 24 st49 17 st46 15 st20 8 st29 22 st45 15 st1 15 st30 24 st18 13 st19 13 st31 19 st24 18 st12 20 st44 16 st32 25 st8 18 st39 20 st50 22 st21 20 st6 18 st51 23 st15 21 st9 16 st41 20 st48 30 st4 25 st10 20 st17 15
()
library()
install.packages()
knitr
: a package for RmarkdownKnitr is the system that merges R code and Markdown to produce documents that depend on data
It has many functions. We used two of them:
knitr::kable()
is a function to produce nicer tables
pander()
from the pander packageknitr::opts_chunk$set()
to set the default options for each chunkkable()
survey[1:5,]
Gender birth_day birth_month birth_year height_cm weight_kg handness st1 Male 1 2 1993 179 67 Right st2 Female 9 10 1995 167 58 Right st3 Female 28 1 1995 170 56 Left st4 Male 11 8 1992 180 94 Right st5 Female 1 1 1991 160 60 Right hand_span_cm st1 15 st2 18 st3 18 st4 25 st5 19
kable()
knitr::kable(survey[1:5,])
Gender | birth_day | birth_month | birth_year | height_cm | weight_kg | handness | hand_span_cm | |
---|---|---|---|---|---|---|---|---|
st1 | Male | 1 | 2 | 1993 | 179 | 67 | Right | 15 |
st2 | Female | 9 | 10 | 1995 | 167 | 58 | Right | 18 |
st3 | Female | 28 | 1 | 1995 | 170 | 56 | Left | 18 |
st4 | Male | 11 | 8 | 1992 | 180 | 94 | Right | 25 |
st5 | Female | 1 | 1 | 1991 | 160 | 60 | Right | 19 |
So far all the files we have used is structured
That is, they have rows and columns
We use read.table
and write.table
to read and write a data frame
Sometimes the data is not a table
people <- list(Ali=list(age=18, sex='M'), Bahar=list(age=19, sex='F'), valid=c(TRUE,FALSE)) people
$Ali $Ali$age [1] 18 $Ali$sex [1] "M" $Bahar $Bahar$age [1] 19 $Bahar$sex [1] "F" $valid [1] TRUE FALSE
How can we read and write lists?
There are several options to store lists into files.
A good one is YAML, which looks like this:
Ali: age: 18.0 sex: M Bahar: age: 19.0 sex: F valid: - yes - no
:
-
---
before and after the YAML codeGoogle “YAML” for more info
We use YAML for the Rmarkdown metadata. For example
--- title: "Midterm Exam" subtitle: "Computing in Molecular Biology 1" author: "Put your name here" number: STUDENT_NUMBER date: "October 25, 2018" output: html_document ---
library(yaml) write_yaml(people, "datafile.yml") persons <- read_yaml("datafile.yml") persons
$Ali $Ali$age [1] 18 $Ali$sex [1] "M" $Bahar $Bahar$age [1] 19 $Bahar$sex [1] "F" $valid [1] TRUE FALSE
references: - type: article-journal id: WatsonCrick1953 title: 'Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid' author: - family: Watson given: J. D. - family: Crick given: F. H. C. container-title: Nature volume: 171 issue: 4356 page: 737-738 issued: date-parts: - - 1953 - 4 - 25
Put all the references somewhere in the document, with ---
before and after.
[@WatsonCrick1953]
produces (Watson and Crick 1953)[@WatsonCrick1953, pp. 33-35, 38-39]
becomes (Watson and Crick 1953, 33–35, 38–39).[@WatsonCrick1953; @Collado-Vides2009a]
becomes (Watson and Crick 1953; Collado-Vides et al. 2009).@WatsonCrick1953 [p. 33]
says blah becomes Watson and Crick (1953, 33) says blahIf you have a long list of all papers, and you use it on several documents, then you should put the references in a separate file
Then you write
bibliography: references.yml
in the document metadata
Format | File extension |
---|---|
BibLaTeX | .bib |
BibTeX | .bibtex |
Copac | .copac |
CSL JSON | .json |
CSL YAML | .yaml |
EndNote | .enl |
EndNote XML | .xml |
ISI | .wos |
MEDLINE | .medline |
MODS | .mods |
RIS | .ris |
It is good that RMarkdown uses all these formats
There are many tools to manage your paper collection
It is not enough to download PDF and store them in a folder. They need to be organized and have a structure
Two good and free programs are Mendeley and Zotero
Bibliographies will be placed at the end of the document. Normally, you will want to end your document like this:
last paragraph... # References
The bibliography will be inserted after this header. More info at
http://rmarkdown.rstudio.com/ authoring_bibliographies_and_citations.html
Collado-Vides, J, H Salgado, E Morett, S Gama-Castro, V Jiménez-Jacinto, I Martínez-Flores, A Medina-Rivera, L Muñiz-Rascado, M Peralta-Gil, and A Santos-Zavaleta. 2009. “Bioinformatics Resources for the Study of Gene Regulation in Bacteria.” Journal of Bacteriology 191 (1): 23–31.
Watson, J. D., and F. H. C. Crick. 1953. “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid.” Nature 171 (4356): 737–38. https://doi.org/10.1038/171737a0.