This course is an introduction to Data Science for students of Molecular Biology. We use the R language to learn the basic tools to handle structured data and extract valuable scientific information from it.
Classes
This is the plan for teaching, based on the previous year. Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.
Why “Computing in Molecular Biology”?. (Sep 17, 2018).
What is a computer? Why do we care?[Slides].Representing things with numbers. (Sep 20, 2018).
Memory, Files and Documents.[Slides].Structured Documents. (Sep 24, 2018).
Introduction to Rstudio and to Markdown.[Slides].Using R and RStudio. (Sep 27, 2018).
Basic usage of RStudio. Introduction to R. Basic Data Types: Numeric, Character, Logic and Factor.[Slides].Structured documents and data. (Oct 1, 2018).
RMarkdown. Indexing Vectors.[how-to-solve-it.Rmd], [how-to-solve-it.html], [style.css], [Slides].Mixing Markdown and R. (Oct 4, 2018).
How to answers Quizzes, Exams and Make-ups.[class06.Rmd], [slides06.Rmd], [Slides].Lists: Mixing different types of data. (Oct 8, 2018).
Also, a comment about digital signatures, and a Quiz you have to do.[quiz-1.html], [answers.Rmd], [quiz-1-answers.Rmd], [Slides].Welcome to the Matrix. (Oct 11, 2018).
Structures in two dimensions. Matrices and Data Frames.[Slides].Using Data Frames. (Oct 15, 2018).
Telling stories[survey1-tidy.txt], [class09.Rmd], [Slides].Telling stories. (Oct 18, 2018).
Introduction to Descriptive Statistics.[Slides].Quiz 2. (Oct 22, 2018).
Second rehearsal for Midterm Exam[Document].Reading and Sorting Lists and Data Frames. (Oct 25, 2018).
[midterm-calendar.Rmd], [midterm-calendar.html], [Slides].Data Visualization. (Nov 12, 2018).
Telling stories with pictures. “One image worths a thousand words”. Plots, barplots, histograms. Making “nice” drawings. Adding points and lines.[Slides].More Data Visualization. (Nov 15, 2018).
Plotting two vectors, numeric or factor. Formulas.[Slides].Quiz 3. (Nov 19, 2018).
Practice plotting data[Document].Subsets and formulas. (Nov 22, 2018).
Easier ways to plot. Also, introduction to Linear Models.[Slides].Hooke’s Law on Coils. (Nov 26, 2018).
A simple application of linear models.[rubber1.txt], [coins.txt], [marbles.txt], [Slides].Logarithmic scales. (Nov 29, 2018).
Not all lines are straight lines. Exponential growth in Science and Technology. What will be your future?[kleiber.txt], [Transistor_count.txt], [dna_price.txt], [Slides].Quiz 4. (Dec 3, 2018).
Practice with linear models.[sra_bases.txt], [planets.txt], [Slides].Polynomial Models. (Dec 6, 2018).
Not all lines are straight lines.[Slides].Interactions between variables. (Dec 10, 2018).
Modeling the results of Quiz 4.[fall-raw.txt], [fall-tidy.txt], [free-fall.txt], [Slides].How to succeed at the Exam. (Dec 13, 2018).
Linear models with factors[cmb.Rdata], [Slides].Quiz 5. (Dec 17, 2018).
Fecal DNA[NaCl-elutions.txt], [animals.txt], [capture-experiments.txt], [fecal-captures.txt], [libraries.txt], [spike-in-DNA.txt], [Slides].Last Class. (Dec 20, 2018).
Some things you need to know.[class24.Rmd], [class24.html], [Slides].
Other posts
Some Free Online Resources about R
- How to read an R help page
- Getting Started with R
- Free Course: Introduction to R
- TryR
- Introduction to Data Science
- Book R for Data Science
- Book Data Visualization: A practical introduction by Kieran Healy, Duke University
References
Polya, G. and Conway, John H. How to Solve It: A New Aspect of Mathematical Method. Princeton Science Library.
Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.