February 9, 2016
The previous course was “Introduction to Data Science”
This course is “Scientific Computing”
Because computers are essential tools for Molecular Biologists
They control the instruments
The help us to understand the results
They help us to design the experiments
We will focus on the last 2 items
“Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently”
“Software is as important to modern scientific research as telescopes and test tubes”
“…recent studies have found that scientists typically spend 30% or more of their time developing software…”
“We believe that software is just another kind of experimental apparatus and should be built, checked, and used as carefully as any physical apparatus”
“However, […] most [scientists] do not know how reliable their software is. This can lead to serious errors impacting the central conclusions of published research”
“Recent high-profile retractions, technical comments, and corrections because of errors in computational methods include papers in Science, PNAS, the Journal of Molecular Biology, Ecology Letters, the Journal of Mammalogy, Journal of the American College of Cardiology, Hypertension, and The American Economic Review”.
Wilson et al. “Best Practices for Scientific Computing.” PLoS Biology 12,1 (2014)
Modern biology increasingly requires computational and quantitative methods to collect, pro- cess, and analyze data, as well as to understand and predict the behavior of complex systems.
Whereas throughout much of the 20th century computational and mathematical biology were niche disciplines, their methods are now becoming an integral part of the practice of biology across all fields.
Stefan et al. “The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences”. PLoS Comput. Biol. 11, 1–12 (2015).
The authors say:
“We broadly categorize these goals into three domains”
- “thinking,”
- “doing”
- “feeling”
This reflects our belief that developing practical programming skills (“doing”) is of limited use if one does not also develop both the ability to think about problems algorithmically (“thinking”) and a positive attitude towards computing (“feeling”).
Students will be able to
Students will be able to
Students will
The Graybeard engineer retired and a few weeks later the Big Machine broke down, which was essential to the company’s revenue.
The Manager couldn’t get the machine to work again so the company called in Graybeard as an independent consultant.
Graybeard agrees. He walks into the factory, takes a look at the Big Machine, grabs a sledge hammer, and whacks the machine once whereupon the machine starts right up.
Graybeard leaves and the company is making money again.
The next day Manager receives a bill from Graybeard for $5,000.
Manager is furious at the price and refuses to pay. Graybeard assures him that it’s a fair price.
Manager retorts that if it’s a fair price Graybeard won’t mind itemizing the bill. Graybeard agrees that this is a fair request and complies.
The new, itemized bill reads…
A lot of practice
Today we will focus on a key idea.
To understand the data we need structure
For example, in R we use data frames to represent tabular data. We also have lists containing any other element, incuding other lists. This is a hierarchical structure.
Folders in the disk are also a hierarchical structure. Tabular data can be stored in text files, with values in columns.
Text documents also have a logical structure
Ordinary word processors are based on the WYSIWYG (What You See Is What You Get) philosophy
Users are encouraged to change fonts, sizes, colors and other visual attributes
Writing and formatting at the same time is distracting.
The idea is to write first, and format later, as close as possible to the time of publication.
While a word processor is the embodiment of the WYSIWYG (What You See Is What You Get) philosophy, LaTeX represents WYMIWYG—What You Mean Is What You Get. The information you enter defines the meaning of the document. The typesetting program, set up with enormous numbers of typesetting rules, then generates beautiful output for you.
The first mistake that most word processing programs make is that they don’t encourage the separation of style and content—some don’t even permit it. When I write, I structure my text in paragraphs. These are then assembled into sections, chapters, etc.
An alternative to ordinary Word Processors is to use text files with a few rules to mark the role of each element.
Text files can be read with any computer, and will be accessible for ever.
Today the Structured text format more used is called Markdown
Here we show some of the rules
*italic* **bold**
italic
bold
# Header 1 ## Header 2 ### Header 3
* Item 1 * Item 2 + Item 2a + Item 2b
1. Item 1 2. Item 2 3. Item 3 + Item 3a + Item 3b
Use a plain http address or add a link to a phrase:
<http://example.com> [linked phrase](http://example.com)
Images on the web or local files in the same directory:
![alt text](http://example.com/logo.png) ![alt text](figures/img.png)
| | sample | dose | time | agent | |--------|----------|------|--------|------------------| | 1 | GSM91440 | low | 5 min | caffeine | | 2 | GSM91893 | low | 5 min | caffeine | | 3 | GSM91428 | low | 5 min | calcofluor white | | 4 | GSM91881 | low | 5 min | calcofluor white |
sample | dose | time | agent | |
---|---|---|---|---|
1 | GSM91440 | low | 5 min | caffeine |
2 | GSM91893 | low | 5 min | caffeine |
3 | GSM91428 | low | 5 min | calcofluor white |
4 | GSM91881 | low | 5 min | calcofluor white |
``` this <- is.computer(code) ```
this <- is.computer(code)
File -> New File -> Text File
How to solve it by G. Polya
RStudio incorporated a clever idea
{r}
--- title: "Untitled" output: html_document --- This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>. ```text summary(cars) ```
Create a new “RMarkdown” document in RStudio
{r}
Create an HTML document that
birth.txt
fileHeuristics:
Argument:
Stefan, M. I., Gutlerner, J. L., Born, R. T. & Springer, M. “The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences”. PLoS Comput. Biol. 11, 1–12 (2015). doi:10.1371/journal.pcbi.1004208.
Wilson, G., D. a. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, et al. “Best Practices for Scientific Computing.” PLoS Biology 12, no. 1 (2014): e1001745. doi:10.1371/journal.pbio.1001745.
Noble, William Stafford. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (2009): 1–5. doi:10.1371/journal.pcbi.1000424.
Describe what can we get from NCBI.
It is big, so we will focus only in
Write your slides in RMarkdown