This course is an introduction to Data Science for students of Molecular Biology. We use the R language to learn the basic tools to handle structured data and extract valuable scientific information from it.
Molecular Biology is going around computing and informatics in these days. Obtaining data is easy and cheap, processing data is hard and expensive to do and learn. A molecular biologist have to be able to understand what he/she produced. Otherwise he/she is a pipetting robot. Informatics and computing is what you have to learn, work harder.
Computing in Molecular Biology course was really hard to understand for us. In first year I failed with FF, then next year I passed with AA. I was able to pass when I understand the purpose of the computational methods, how important they are and how can we use it in molecular biology.
The course content and problems are very educational for beginner students but the main problem is that they have no perspective about computational sciences they have just thinking passing exam and move on. Everything about the course is depends on students behaviour and I think the lecturer makes a great effort for teaching so if students do not want to learn they will lose themselves.
This course was so interesting for me. I really didn’t like computers and I didn’t know anything about course and programing skills. This course was so much useful for me. I could understand your expression and your body language, but it was so fast for me because everything was new and hard for me, so understanding was hard.
Tabi ki bu dersleri seçerken en önemli etken hepimizin bildiği bir gerçek, moleküler biyolojide bilgisayarın önemidir. Yine bildiğimiz gibi, yaptığımız deneyler, elde ettiğimiz veriler , düzgün bir şekilde analiz edilip anlamlı bir çıkarıma dönüştürülmediği sürece hiçbir önem arz etmemektedirler. Şimdi elbette tercih sizlerin ancak bu derslerin bizler için çok önemli olduğunu göz önüne alarak ve tahmin ettiğim üzere okulda dolaşan korkulu senaryoları bir kenara bırakarak, karar vermeniz sizlerin yararınıza olacaktır
Homework
All quizzes and homework should be sent to andres.aravena+cmb@istanbul.edu.tr before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.
- Homework
1 (Deadline: Tuesday 8 of October at 9:00).
Create a RMarkdown document with the same content and the same structure of a published paper. - Homework
2 (Deadline: Tuesday 15 of October at 9:00).
Practice for midterm exam. Vectors, indices, and general ideas about using R. - Homework
3 (Deadline: Monday 4 of November at 9:00).
Practice for midterm exam. Lists and data frames. - Homework
4 (Deadline: Tuesday 3 of December at 9:00).
Plot vectors, choose colors, symbols, and size. - Homework
5 (Deadline: Tuesday 10 of December at
9:00).
Scatter plots, choose colors, size, titles, and scale. - Homework
7 (Deadline: Tuesday 31 of December at
8:00).
Exam Rehearsal. - Homework
6 (Deadline: Tuesday 17 of December at
8:00).
Subsets and linear models.
Classes
Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.
Structured Documents. (Sep 17, 2019).
Introduction to Rstudio and to Markdown.[Slides].Why “Computing in Molecular Biology”?. (Sep 17, 2019).
What is a computer? Why do we care?[Slides].Practice with Structured Documents. (Sep 24, 2019).
Introduction to Rstudio and to Markdown.[Slides].Using R and RStudio. (Oct 1, 2019).
Basic usage of RStudio. Introduction to R. Basic Data Types: Numeric, Character, Logic and Factor.[Slides].Making and Indexing Vectors. (Oct 8, 2019).
Handling structured data.[Slides].Combining Markdown and R. (Oct 8, 2019).
How to answers Quizzes, Exams and Make-ups.[class06.Rmd], [slides06.Rmd], [Slides].Lists: Mixing different types of data. (Oct 15, 2019).
Also, a comment about digital signatures, and a Quiz you have to do.[Slides].Welcome to the Matrix. (Oct 15, 2019).
Structures in two dimensions. Matrices and Data Frames.[Slides].Telling stories. (Oct 22, 2019).
Introduction to Descriptive Statistics.[Slides].Using Data Frames. (Oct 22, 2019).
Telling stories[Slides].Data Visualization. (Nov 19, 2019).
Telling stories with pictures. “One image worths a thousand words”. Plots, barplots, histograms. Making “nice” drawings. Adding points and lines.[survey1-tidy.txt], [midterm.txt], [Slides].More Data Visualization. (Nov 26, 2019).
Plotting two vectors, numeric or factor. Formulas.[Slides].Handling Lists and Data Frames. (Nov 26, 2019).
[Slides].Hooke’s Law. (Dec 3, 2019).
A simple application of linear models.[rubber.txt], [Slides].Subsets and formulas. (Dec 3, 2019).
Easier ways to plot. Also, introduction to Linear Models.[survey2019.txt], [Slides].Logarithmic scales. (Dec 10, 2019).
Not all lines are straight lines. Exponential growth in Science and Technology. What will be your future?[kleiber.txt], [Transistor_count.txt], [dna_price.txt], [Slides].Logarithmic models. (Dec 10, 2019).
Not all lines are straight lines[Slides].Practice with Linear Models. (Dec 17, 2019).
Get ready.[sra_bases.txt], [Slides].Polynomial Models. (Dec 17, 2019).
Not all lines are straight lines.[free-fall.txt], [Slides].
Attendance
By regulation from the Rectory, students need to attend at least 70% of the classes. The attendance book is updated every week and can be seen in Google Sheets.
Some Free Online Resources about R
- How to read an R help page
- Getting Started with R
- Free Course: Introduction to R
- TryR
- Introduction to Data Science
- Book R for Data Science
- Book Data Visualization: A practical introduction by Kieran Healy, Duke University
About RMarkdown
Recommended readings
Polya, G. and Conway, John H. How to Solve It: A New Aspect of Mathematical Method. Princeton Science Library.
Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.