Class 7: Multiple tests

Systems Biology

Andrés Aravena, PhD

November 16, 2023

Essential Maths

Math is not sums, calculations, and formulae.
It is pulling things apart to understand how things work

Colin Wright, juggler,
inventor of the mathematical notation of juggling

Essential Math for Biology

In my opinion, all biologist should know something about

Set theory
Logic
Probabilities
Graphs (Networks)

The rest depends on each case

(maybe calculus and linear algebra)

Probabilities

An event is a set of outcomes

The set of all possible outcomes is often called Ω

An event 𝐴 can be seen as the set of all outcomes that make the event true

For example,

Fever={Temperature>37.5°C}

Evaluating rational beliefs

An event will become either true or false after an experiment

For example, a dice can be either 4 or not

We want to give a value to our rational belief that the event will become true after the experiment

The numeric value is called Probability

Probabilities as Areas

It is useful to think that the probability of an event is the area in the drawing

The total area of Ω is 1

Usually we do not know the shape of 𝐴

Probabilities depend on our knowledge

Our rational beliefs depend on our knowledge

If we represent our knowledge (or hypothesis) by 𝑍, the the probability of an event 𝐴 is written as \[ℙ(A|Z)\] We read “the probability of event 𝐴, given that we know 𝑍”

For example, “the probability that we get a 4, given that the dice is symmetrical”

Important idea

The order is relevant \[ℙ(A|Z)≠ℙ(Z|A)\] There are two events, 𝐴 and 𝑍

The one written after | is what we assume to be true

The one written before | is what we are asking for

One we know, the other we do not

Visually

Now outcomes are limited only to the 𝑍 region

We measure the area of \(ℙ(A|Z)\) with respect to the area of 𝑍 instead of Ω

The shape of 𝑍 is often unknown

Degrees of belief

If, given our knowledge 𝑍, the event 𝐵 is more plausible than the event 𝐴, then \[ℙ(A|Z)≤ℙ(B|Z)\]

For example, the probability that we get either 4, 5 or 6 is greater than the probability that we get a 4, given that the dice is symmetrical \[ℙ(\{4\}|Z)≤ℙ(\{4,5,6\}|Z)\]

Degrees of belief with new info

On the other hand, if we get new information, the probabilities may change

The same event 𝐴 may be more plausible under a new hypothesis 𝑌 than under the initial hypothesis 𝑍

Then \[ℙ(A|Z)≤ℙ(A|Y)\]

Probability rules based on these two ideas

It has been proven that probabilities must be like this

A probability is a number between 0 and 1 inclusive \[ℙ(A) ≥ 0\textrm{ and } ℙ(A)≤1\]
The probability of an sure event is 1 \[ℙ(\textrm{True}) = 1\]
The probability of an impossible event is 0 \[ℙ(\textrm{False}) = 0\]

Complex events

We are interested in non-trivial events, that are usually combinations of smaller events

For example, we may ask “what is the probability that, in a group of 𝑛 people, at least two persons have the same birthday”

Fortunately, any complex event can be decomposed into simpler events, combined with and, or and not connectors

Exercise: decompose the birthday event into simpler ones

Probability of not 𝐴

If the event 𝐴 becomes more and more plausible, then the opposite event not 𝐴 becomes less and less plausible

It can be shown that we always have \[ℙ(\textrm{not } A) = 1-ℙ(A)\]

Probability of 𝐴 and 𝐵

\[ℙ(A\text{ and } B)=\frac{\text{Number of cases where }(A\text{ and } B)\text{ is true}}{\text{Total cases of combinations of }A\text{ and } B}\]

If \(n_A\) and \(n_B\) are the total number of cases for \(A\) and \(B\), then the total number of cases is \(n_A⋅n_B\)

In the same way, if \(m_A\) and \(m_B\) are the number of cases where \(A\) and \(B\) are true, respectively, then the number of cases where \((A\text{ and }B)\) is true is \(m_A⋅m_B\)

\[ℙ(A\text{ and } B)=\frac{m_A⋅m_B}{n_A⋅n_B}=\frac{m_A}{n_A}⋅\frac{m_B}{n_B}\]

Interpretation

We could say that \[\frac{m_A}{n_A}=ℙ(A)\qquad\frac{m_B}{n_B}=ℙ(B)\] but we have to be careful. The result of A may affect \(m_B\) and \(n_B\). We better write \[\frac{m_A}{n_A}=ℙ(A)\qquad\frac{m_B}{n_B}=ℙ(B|A)\]

Rewriting the Probability of 𝐴 and 𝐵

\[ℙ(A\text{ and } B)=\frac{m_A}{n_A}⋅\frac{m_B}{n_B}=ℙ(A)⋅ℙ(B|A)\] To simplify, instead of \(ℙ(A\text{ and } B)\) we write \(ℙ(A, B)\)

Thus, we write \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\] “Prob that (𝐴 and 𝐵) happens is Prob that 𝐴 happens times Prob that 𝐵 happens given that A happens”

Joint Probability

We know that \((A\text{ and } B)\) is always the same as \((B\text{ and } A)\)

There are two ways to calculate the probability of of 𝐴 and 𝐵 happening simultaneously

Start with the prob. of \(A\) and then of \(B\) given that \(A\) is true \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\]
Start with the prob. of \(B\) and then of \(A\) given that \(B\) is true \[ℙ(A,B)=ℙ(B)⋅ℙ(A|B)\]

Summary

The order matters: \(ℙ(A|Z)≠ℙ(Z|A)\)
To get the probability of \(A\) and \(B\) together we find the probability of \(A\) and then of \(B\) given that \(A\) is true \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\]

Birthday paradox

What is the probability that two people in the class share the same birthday?

How many people we need to guarantee at least 50% probability?

Intuition says we need 365/2 people

That is wrong

Decomposing complex into simple

Conditional probability

Probability of 𝐴 or 𝐵

We know how to calculate \(ℙ(A\text{ and } B)\) and \(ℙ(\text{not } A)\)

We also know the De Morgan’s law, to swap ANDs with ORs
\[\text{not } (A \text{ or } B) = (\text{not } A) \text{ and } (\text{not } B)\]

Therefore we can write

\[ \begin{aligned} ℙ(A \text{ or } B) & = 1 - ℙ(\text{not }(A \text{ or } B))\\ & = 1-ℙ( (\text{not } A) \text{ and } (\text{not } B)) \end{aligned} \]

Using the multiplication rule

\[ℙ(A \text{ or } B) = 1-ℙ( (\text{not } A) \text{ and } (\text{not } B)) \\ = 1-ℙ(\text{not } A)⋅P(\text{not } B|\text{not } A)\]

using negation rule \[ \begin{aligned} ℙ(A \text{ or } B) & = 1-ℙ(\text{not } A)⋅(1- ℙ(B|\text{not } A)) \\ & = 1-ℙ(\text{not } A) + ℙ(\text{not } A)⋅P(B|\text{not } A) \end{aligned} \]

Using the multiplication rule again

\[ \begin{aligned} ℙ(A \text{ or } B) & = 1 -ℙ(\text{not } A) + ℙ(\text{not } A,B) \\ ℙ(A \text{ or } B) & = 1 -(1-ℙ(A)) + ℙ(\text{not } A|B)ℙ(B) \\ ℙ(A \text{ or } B) & = ℙ(A) + (1-ℙ(A|B))ℙ(B) \\ ℙ(A \text{ or } B) & = ℙ(A) + ℙ(B)-ℙ(A|B)ℙ(B) \\ ℙ(A \text{ or } B) & = ℙ(A) + ℙ(B)-ℙ(A,B) \end{aligned} \] You need to remember only the last line

The previous lines justify why the last one is always true

Do not count twice

If A and B can happen at the same time, then \(ℙ(A) + ℙ(B)\) counts the intersection twice

So we have to take out the intersection \(ℙ(A,B)\) \[ℙ(A \text{ or } B) = \\ ℙ(A) + ℙ(B)-ℙ(A,B)\]

It gets complicated

If there are three compatible events, things get messy

\[\begin{aligned} & ℙ(A \text{ or } B \text{ or } C) \\ & ℙ(A) + ℙ(B \text{ or } C)-ℙ(A,(B \text{ or } C)) \\ & ℙ(A) + ℙ(B) + ℙ(C)-ℙ(B,C) - ℙ(A,B \text{ or } A,C) \\ & ℙ(A) + ℙ(B) + ℙ(C)-ℙ(B,C) - (ℙ(A,B) + ℙ(A,C) - ℙ(A,B,C)) \\ & ℙ(A) + ℙ(B) + ℙ(C)-ℙ(B,C) - ℙ(A,B) - ℙ(A,C) + ℙ(A,B,C) \end{aligned} \]

It gets worse with more events

Boole’s inequality

Now we can see that

\[{\mathbb P}\left(\bigcup_{i=1}^{\infty} A_i \right) \le \sum_{i=1}^{\infty} {\mathbb P}(A_i)\]

in other words

\[\mathbb P\left(A_1\text{ OR }A_2 \text{ OR } … \text{ OR } A_k \right) \le \mathbb P(A_1)+\mathbb P(A_2)+\cdots+\mathbb P(A_k)\]

There is an easier way

Using De Morgan’s rule

\[\begin{aligned} & ℙ(A \text{ or } B \text{ or } C) \\ & 1 - ℙ((\text{not } A) \text{ and } (\text{not } B) \text{ and } (\text{not } C))\\ & 1 - ℙ(\text{not } A)⋅ℙ(\text{not } B | \text{not } A)⋅ℙ(\text{not } C | \text{not } A, \text{not } B)\\ & 1 - (1-ℙ(A))⋅(1-ℙ(B | \text{not } A))⋅(1-ℙ(C | \text{not } A, \text{not } B)) \end{aligned} \]

This is often easier to calculate

Solving the paradox

Let’s say we have three people, with birthday \(x_1, x_2\) and \(x_3.\)

The probability that there are at least two people with the same birthday is \[ℙ(x_2=x_1 \text{ or } x_3=x_2 \text{ or } x_3=x_1)\] which can be rewritten as \[1-ℙ(x_2≠x_1 \text{ and } x_3≠x_2 \text{ and } x_3≠x_1)\]

Now we only have and combinations

We want to calculate \[1-ℙ(x_2≠x_1 \text{ and } x_3≠x_2 \text{ and } x_3≠x_1)\] We can separate like this (only the first and) \[1-ℙ(x_2≠x_1)⋅ℙ(x_3≠x_2 \text{ and } x_3≠x_1|x_2≠x_1)\] Assuming 365 possible birthdays, we have \[1-\frac{364}{365}⋅\frac{363}{365}\]

Multiple tests

Family-wise error rate

If \(m\) independent comparisons are performed, the family-wise error rate (FWER), is given by

\[\bar{\alpha} = 1-\left( 1-\alpha_{\{\text{per comparison}\}} \right)^m\]

Hence, \(\bar{\alpha}\) increases as the number of comparisons increases.

Bonferroni method

If we do not assume that the comparisons are independent, then we can still say:

\[\bar{\alpha} \le m \cdot \alpha_{\{\text{per comparison}\}}\]

which follows from Boole’s inequality. Example: \(0.2649=1-(1-.05)^6 \le .05 \times 6 = 0.3\)