12 November 2014
Bioleaching bacteria.
We focused on several questions.
One of the key ones was:
understanding transcriptional regulation on A. ferrooxidans
Limitations:
Modeling regulation by integrating genomic and transcriptomic data.
Identification of sets of genes sharing similar behaviors through different environmental stresses:
In contrast, a transcriptional regulatory network (TRN) corresponds to a physical model of the interactions
Co-expression is explained by the existence of a common regulator acting on them directly or indirectly through a regulatory cascade. Either:
Since A. ferrooxidans regulation data is scarce, we use E.coli as a test platform.
The network of experimentally validated regulations described in RegulonDB only explains 3,990 (6.5%) of the 61,506 observed co-expressions.
A putative TRN was built using E.coli genomic sequence and patterns from Prodoric database of transcription factors and binding sites.
We found that this putative TRN explained 91.1% of the pairs of co-expressed operons.
Putative TRNs are usually huge, due to the low specificity of methods based on the sequence.
The LOMBARDE method requires for the studied organism (here a prokaryote) the following input:
In a second stage LOMBARDE deciphers the co-expression of the pair \((gene_{1}, gene_{2})\in \mathcal O\) by identifying a common regulator \(gene_{3}\) which is connected to both \(gene_{1}\) and \(gene_{2}\) via regulatory cascades of high confidence.
In graph terms, a subgraph \(S\) is an explanation for the pair \((gene_{1}, gene_{2})\) if \(S\) is the union of two independent paths from \(gene_{3}\) (the common regulator) to \(gene_{1}\) and to \(gene_{2}\).
An explanation for \((gene_{1}, gene_{2})\) is said to be confident if it is of minimum cost among all the explanations for the pair.
Our model transforms a parsimony cirteria into a graph minimization problem.
The output of LOMBARDE is a subgraph \(\mathcal L\) of \(\mathcal G\) built as the union of all confident explanations for each co-expressed pair in \(\mathcal O\).
The putative TRN for E.coli contains 25,604 arcs, 444 of them corresponding to experimentally validated arcs.
After applying LOMBARDE most of its arcs are discarded, keeping only 4,817 (18.8%).
However, among the validated arcs, LOMBARDE is less aggressive, keeping 289 (65.1%) of them.
This shows that the output of LOMBARDE is biased towards experimentally validated regulations.
We evaluate a little more the effect of an eventual improvement in the putative TRN prediction, we compared between the results of LOMBARDE applied to the original putative TRN and to the extended TRN for E.coli.
This shows that, using the current putative TRN prediction methods, LOMBARDE is capable of detecting a core of key regulations which explains the observed co-expressions, and confirms the bias of LOMBARDE towards validated arcs under more precise putative TRN predictions.
The network produced by LOMBARDE also contains most of the global regulators described for E.coli:
Using the radiality index, we could rank the regulators on LOMBARDE output. Among the most relevant regulators in this network we recovered 10 of the known global regulators.
LOMBARDE produces networks with realistic degree distributions, recovering and giving a central role to most of the global regulators described in literature.
In other words, LOMBARDE shapes the resulting network towards the structural characteristics of a true regulatory network.
Alejandro Maass (CRG, CMM)
Servet Martínez (CMM)
Mauricio González (CRG, INTA)
BioSigma
Anne Siegel (INRIA)
Miguel Allende (CRG)
MATHomics team