😒😞😔😟😕🙁😣😖😫😩😢😭😵
Scientist work is to understand Nature
We start by Observing Nature, usually measuring values.
These are exploratory experiments.
We study this in other courses.
The thing we study must be reproducible, and we need to see that repetition.
We can find them using plots, linear models, clustering, etc.
This is the most important part.
Good answers to bad questions are useless.
Good questions are good, even if we don’t have answers
We answer these questions using models and explanations
Valid models should make predictions that we can test in the lab…
These are validation experiments.
If the results do not match the prediction, we know that the explanation is wrong. Two steps back.
Now we publish our data and model, so other scientists validate or reject it.
The final validation is to be published.
If the paper is accepted and published, our work becomes part of our shared human knowledge.
The goal of Science is to produce new Knowledge.
When we observe Nature we use our previous Knowledge
We look for new Patterns that raise new Questions.
“Noise becomes Signal”
1 Empiric
2 Theoretical
3 Simulation Based
4 Data Based
How can we know the genome of an organism?
What is the quality of a DNA read? What does quality 30 means?
FASTQ file format for sequences with quality scores
Reads are mapped to the genome using one of many tools (bowtie, bwa, hisat)
All these programs give their results in SAM/BAM format. What is that format?
What are SAM files? What are they used for?
What is the difference between SAM and BAM files?
Tested in a sample on patients, as follows
COVID+ | COVID- | Total | |
---|---|---|---|
PCR+ | 135 | 2 | 137 |
PCR- | 3 | 216 | 219 |
Total | 138 | 218 | 356 |
How many false positives we have? How many false negatives?
What is the sensitivity of this test? What is the
specificity?
If you apply the previous test on a random person in the street (not showing any symptoms), and the test is positive, what is the probability that the person really has COVID?
Use the sensitivity and specificity values from the previous question, and assume that the prevalence of COVID is 1%.
Please write the NEWICK code for this tree
(((F:73,A:41):19,(C:30,D:55):82):14,(B:78,E:90):48);
What is the difference?
Why do we need both?
Examples of each one