This course is new, different from the previous year.
Things to do
- Register in the Forum
- Install the required software
- Read the class slides
- Download some of our favorite sequences
- Read The Biologist Toolbox: Drawing Systems and Simulating Systems on the computer
This course is an introduction to Computational Thinking. We will use the tools we learned in the previous course and apply them to model and simulate scientific experiments as a way to understand them.
Homework
All quizzes and homework should be sent to (andres.aravena+cmb@istanbul.edu.tr) before the deadline to get a grade. Please be careful; otherwise you will get a grade of zero.
- Homework
1 (Deadline: Wednesday 31 of March at
23:59).
Write to learn. - Homework
2 (Deadline: Friday 19 of March at 8:59).
Teach the computer how to write the flag of Turkey. - Homework
3 (Deadline: Friday 26 of March at 8:59).
Practice writing functions and applying them to several elements of a list. - Homework
4 (Deadline: Friday 2 of April at 8:59).
Practice writing functions, for loops and conditional blocks. - Homework
5 (Deadline: Friday 9 of April at 9:00).
Count rabbits with recursive functions and with a loop. - Homework
6 (Deadline: Friday 16 of April at 9:00).
Simulate a predator-pray system, using Lotka-Volterra model. - Homework
7 (Deadline: Monday 31 of May at 9:00).
Practice of Montecarlo method. Simulate “complex” random systems by decomposing them into simpler ones. - Homework
8 (Deadline: Friday 4 of June at 9:00).
What will be your score in the exam? Use simulation to see what will probably happen. - Homework
9 (Deadline: Friday 11 of June at 9:00).
We cannot predict the future, but we can make educated guesses. Practice educating your guesses.
Classes
Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.
- Class 1: Introduction to Computational Thinking.
(Mar 12, 2021). [Video],[Slides].
Motivation of the course. Learn how to solve hard problems. Get a super-power. - Class 2: Handling DNA in the computer. (Mar 12,
2021). [Video],[Slides].
Proteins and DNA are sequences, that can easily be handled by the computer. Learn how to find them on the web, by accession number or taxonomic id. We use FASTA format and we read them in R. See also: - Class 3: Lists with names, and CG content. (Mar
19, 2021). [Video],[Slides].
Review of Quiz 1. The final discussion of lists. Implementing our ideas in R code. Solving some issues. - Class 4: GC content of all genes. Patterns and
Abstraction. (Mar 19, 2021). [Video],[Slides].
How to do the same thing again and again, without getting tired or bored. Writing functions and applying them to vectors and lists. - Class 5: Practice with sapply. The FOR loop.
(Mar 19, 2021). [Video],[Slides].
Two ways to repeat code. GC content and GC skew. These are the complete slides we wrote during the practice session. - Class 6: Local DNA statistics. (Mar 26,
2021). [Video],[Slides].
Sliding Windows. - Class 7: Making decisions. (Mar 26, 2021).
[Video],[Slides].
Practice using IF-THEN-ELSE. Finding the smallest value. - Class 8: Patterns in patterns in patterns….
(Mar 26, 2021). [Video],[Slides].
In order to understand recursion, one must first understand recursion. - Class 9: Finding the replication origin. (Apr
2, 2021). [Video],[Slides].
GC skew points us in the right direction, but it is not easy. Accumulative sums andwhich.max
help a lot. - Class 10: Accumulative sums and Systems. (Apr
2, 2021). [Video],[Slides].
This is one of the important ideas of the course. To understand complex things, we decompose them into interconnected parts. Complex behaviors can emerge from combining simple parts. Dumb ants can make a smart ant colony. - Class 11: Systems in Biology and Beyond. (Apr
9, 2021). [Slides].
We can describe systems as parts and interactions, and simulate their emergent behavior. See also: - Class 12: Exercises on Systems. (Apr 9,
2021). [Slides].
We can describe systems as parts and interactions, and simulate their emergent behavior. See also:- How to draw systems (and networks) in R..
- code for fat_petri_dish.
- graph for fat_petri_dish.
- code for rabbits.
- graph for rabbits graph.
- code for reverse complement.
- code for water system.
- Class 13: Long-term behavior and effect of initial
conditions. (Apr 16, 2021). [Slides].
We can describe systems as parts and interactions, and simulate their emergent behavior. - Class 14: Can we predict the future?. (Apr 30,
2021). [Slides].
Dynamic systems can be deterministic yet unpredictable. See also:- Modern code to draw quad_map.R.
- Class 15: Probabilities. (Apr 30, 2021).
[Slides].
People think that probabilities are about games. Instead, they are tools for thinking. Thinking about decisions when we have incomplete information. Thinking about the future. Thinking about the meaning of our experiment’s results. - Class 16: Easy and Hard problems. (Apr 30,
2021). [Slides].
Easy problems are “going downhill”, and hard ones are “uphill”. Why it is safe to use online banking. - Class 17: Virtual experiments. (May 7,
2021). [Slides].
What can happen? Simulating random systems - Class 18: Complex random systems. (May 7,
2021). [Slides].
What can happen? Simulating random systems - Class 19: Comments on Midterm Exam. (May 21,
2021). [Slides].
What can happen? Simulating random systems - Class 20: Population and Samples. (May 21,
2021). [Slides].
We want to know a big population, but we can only observe a small sample. How are they related? - Class 21: Central Limit Theorem. (May 28,
2021). [Slides].
We want to know a big population, but we can only observe a small sample. How are they related? - Class 22: Confidence Intervals. (May 28,
2021). [Slides].
We want to know a big population, but we can only observe a small sample. How are they related? - Class 23: Things that you must know. (Jun 4,
2021). [Slides].
If you do not know this, you will probably fail the course. - Class 24: Practice. (Jun 4, 2021).
[Slides].
Makes perfection
Other reading material for classes
Everybody must read and understand the following texts:
- The Biologist Toolbox: Drawing Systems
- The Biologist Toolbox: Simulating Systems on the computer
Sequences for exercises
Most times you will use sequences that we find at NCBI. For exercises, we can use these sequences:
- Candidatus Carsonella ruddii PV DNA
- Escherichia coli str. K-12 substr. MG1655
Required software
For this course we will use the new version of R and Rstudio. These two tools work together. Install R first, then install Rstudio.
These videos may help you.
Online learning
This semester we will carry on the course online. That is an interesting challenge since it makes some things harder but others simpler. To start, everybody should read this paper:
Searls DB. “Ten simple rules for online learning”. PLoS Computational Biology. 2012;8(9):e1002631. DOI: 10.1371/journal.pcbi.1002631. Epub 2012 Sep 13. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441493/
This material is also very much recommended:
Recommended readings
Stefan, M. I., Gutlerner, J. L., Born, R. T. & Springer, M. “The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences”. PLoS Comput. Biol. 11, 1–12 (2015). doi:10.1371/journal.pcbi.1004208.
Wilson, G., D. a. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, et al. “Best Practices for Scientific Computing.” PLoS Biology 12, no. 1 (2014): e1001745. doi:10.1371/journal.pbio.1001745.
Noble, William Stafford. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (2009): 1–5. doi:10.1371/journal.pcbi.1000424.
Elson D, Chargaff E (1952). On the deoxyribonucleic acid content of sea urchin gametes. Experientia 8 (4): 143–145.
Chargaff E, Lipshitz R, Green C (1952). Composition of the deoxypentose nucleic acids of four genera of sea-urchin. J Biol Chem 195 (1): 155–160.
Roten C-AH, Gamba P, Barblan J-L, Karamata D. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes. Nucleic Acids Research. 2002;30(1):142-144.
Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.
Frey, Carl Benedikt, and Michael A Osborne. “The Future of Employment: How Susceptible Are Jobs to Computerisation?” Technological Forecasting and Social Change 114 (January 2017): 254–80. https://doi.org/10.1016/j.techfore.2016.08.019.
Nuzzo, Regina. “How Scientists Fool Themselves – and How They Can Stop.” Nature 526, no. 7572 (2015): 182–85. https://doi.org/10.1038/526182a.
Polya, G. and Conway, John H. “How to Solve It: A New Aspect of Mathematical Method.” Princeton Science Library.
Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. “Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics”. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.
Wickham, H. “Reshaping Data with the Reshape Package.” Journal of Statistical Software 21, no. 12 (2007): 1–20.
Tong, Frances Poyen. “Statistical Methods for Dose-Response Assays.” UC Berkeley Electronic Theses and Dissertations, 2010.