Who
What
How
When
We want to build radios and devices within several constraints
Each part can be simple
The emergent behavior can be complex
We model a system to understand, predict, and control its behavior
A good representation allows simulation and predictions
Protein precursor processing, doi:10.1002/psp4.12155
CaMKII regulation by calmodulin, doi:10.1371/journal.pone.0029406
Scientist work is to understand Nature
We start by Observing Nature, usually measuring values.
These are exploratory experiments.
The thing we study must be repetible, and we need to see that repetition.
We can find them using plots, linear models, clustering, etc.
This is the most important part.
Good answers to bad questions are useless.
Good questions are good, even if we don’t have answers
We answer these questions using models and explanations
Valid models should make predictions that we can test in the lab…
These are validation experiments.
If the results do not match the prediction, we know that the explanation is wrong. Two steps back.
Now we publish our data and model, so other scientists validate or reject it.
The final validation is to be published.
If the paper is accepted and published, our work becomes part of our shared human knowledge.
The goal of Science is to produce new Knowledge.
When we observe Nature we use our previous Knowledge
We look for new Patterns that raise new Questions.
“Noise becomes Signal”
Unfortunately, we do not have time to study all kinds of networks relevant in molecular biology
(at least, not in this course)
We will focus on interaction networks
That is, networks that can be built from gene expression data
In other words we will speak about
Transcription
We will learn to analyze gene expression, so we can design better experiments and achieve higher impact
This course has basically three parts
My blog is at https://www.dry-lab.org/
Course’s blog at https://www.dry-lab.org/blog/2023/sysbio/
All material will be published there
More precisely, mRNA concentration
We want to know
Measuring protein concentration is hard
We assume that protein concentration is proportional to mRNA concentration
Basically
If you have primers for each gene
Raw data: CT value for each gene/condition
and CT value for calibration reference
Southern/Northern/Western blot can detect, but not quantify
(I think so. I’m not a biologist)
Instead, we have macro- and microarrays
Raw data: Light intensity (luminescence) in one or more wave length
This is measured in arbitrary units, and is a number between
0 and 65536
(that is, a 16-bits value)
mRNA is retro-transcribed and fragmented.
Fragments are sequenced. Reads are aligned to reference genome
Raw data: SAM/BAM file with location of each read in the reference genome
Processed data: Number of reads per gene, normalized by gene length
Gene Expression Omnibus
Let’s take a look at
GSE56896
NCBI standard
Industry standard
These are optional, try at least one.
Write a document (in English) explaining your results