Context-specific metabolic networks from omics#

Here we will illustrate the use of the iMAT method, as implemented in CORNETO, to extract context specific metabolic networks from transcriptomics. CORNETO implements an extension of iMAT to support multiple samples, and also gives more control for customization.

We will use the core E. coli PKN, and microarray-based gene expression data corresponding to a anaerobic growth on glucose of a wild-type E. coli strain. This tutorial is based on the original COBRA tutorial.

import corneto as cn
import pandas as pd
import numpy as np

cn.info()
Installed version:v1.0.0.dev5 (latest stable: v1.0.0-alpha)
Available backends:CVXPY v1.6.5, PICOS v2.6.1
Default backend (corneto.opt):CVXPY
Installed solvers:CVXOPT, GLPK, GLPK_MI, HIGHS, SCIP, SCIPY
Graphviz version:v0.20.3
Installed path:/home/runner/work/corneto/corneto/corneto
Repository:https://github.com/saezlab/corneto
from corneto.methods.future.imat import MultiSampleIMAT

MultiSampleIMAT.show_citations()
  • Tomer Shlomi, Moran N Cabili, Markus J Herrgård, Bernhard Ø Palsson, Eytan Ruppin. Network-based prediction of human tissue-specific metabolism. Nature biotechnology, Vol. 26, No. 9, pp. 1003--1010 (2008).
  • Pablo Rodriguez-Mier, Martin Garrido-Rodriguez, Attila Gabor, Julio Saez-Rodriguez. Unified knowledge-driven network inference from omics data. bioRxiv, pp. 10 (2024).

Loading data from COBRA#

We will use the ecoli core metabolic network and ecoli data example obtained from the COBRA package. This dataset contains 4 biological replicates. In this tutorial, we will use the single sample iMAT, we will average the expression and use this for analysis, since we are not interesed in analysing differences of the replicates

from corneto.io import import_cobra_model

G = import_cobra_model("../../tests/gem/ecoli_core.zip")
df_expr = pd.read_csv("../../tests/gem/ecoli_example_data.zip")
df_expr
gene raw_value_1 raw_value_2 raw_value_3 raw_value_4 value
0 b0008 11.211265 12.030984 12.017371 11.674179 11.733450
1 b0114 11.767118 11.581649 11.885833 11.571045 11.701411
2 b0115 11.020531 10.813589 11.144573 11.362138 11.085208
3 b0116 10.299068 10.562963 10.788094 10.425459 10.518896
4 b0118 6.116122 6.437554 6.956686 6.229419 6.434945
... ... ... ... ... ... ...
131 b4153 9.647255 10.089110 8.685851 8.902372 9.331147
132 b4154 9.176739 9.387545 8.841818 8.510611 8.979178
133 b4232 9.490210 9.201019 9.321175 9.235511 9.311979
134 b4301 2.335597 2.338372 2.317967 2.270161 2.315524
135 b4395 2.431005 2.827751 2.822822 2.825172 2.726687

136 rows × 6 columns

G.shape
(77, 91)
# We can inspect the content of the network
G.get_attr_edges()[:5]
[{'__edge_type': 'directed',
  'id': 'EX_ac_b',
  '__source_attr': {'ac_b': {'__value': np.float64(-1.0)}},
  '__target_attr': {},
  'default_lb': np.float64(-1000.0),
  'default_ub': np.float64(1000.0),
  'GPR': ''},
 {'__edge_type': 'directed',
  'id': 'EX_akg_b',
  '__source_attr': {'akg_b': {'__value': np.float64(-1.0)}},
  '__target_attr': {},
  'default_lb': np.float64(-1000.0),
  'default_ub': np.float64(1000.0),
  'GPR': ''},
 {'__edge_type': 'directed',
  'id': 'EX_co2_b',
  '__source_attr': {'co2_b': {'__value': np.float64(-1.0)}},
  '__target_attr': {},
  'default_lb': np.float64(-1000.0),
  'default_ub': np.float64(1000.0),
  'GPR': ''},
 {'__edge_type': 'directed',
  'id': 'EX_etoh_b',
  '__source_attr': {'etoh_b': {'__value': np.float64(-1.0)}},
  '__target_attr': {},
  'default_lb': np.float64(-1000.0),
  'default_ub': np.float64(1000.0),
  'GPR': ''},
 {'__edge_type': 'directed',
  'id': 'EX_for_b',
  '__source_attr': {'for_b': {'__value': np.float64(-1.0)}},
  '__target_attr': {},
  'default_lb': np.float64(-1000.0),
  'default_ub': np.float64(1000.0),
  'GPR': ''}]

Mapping data using Gene-Protein-Reaction (GPR) rules#

We need to compute reaction scores from gene expression data. For this, we will classify genes as highly expressed and lowly expressed based on thresholds applied to the marginal distribution of gene intensities. Then, we will map these values by applying the Gene-Protein-Reaction (GPR) rules included in the Ecoli PKN.

from corneto.methods.metabolism import evaluate_gpr_expression, get_unique_genes

genes = get_unique_genes(G, startswith="b")
len(genes)
101
df_expr.set_index("gene").stack().to_frame(name='expression').hist(bins=30);
../_images/e284dff9c8e6c0c7b66c731fe744be7caab4bfb448410370d15bb62974ecec53.png
df_expr["high"] = (df_expr["value"] >= 8).astype(int)
df_expr["low"] = (df_expr["value"] <= 4).astype(int)
df_expr["score"] = df_expr["high"] - df_expr["low"]
df_expr
gene raw_value_1 raw_value_2 raw_value_3 raw_value_4 value high low score
0 b0008 11.211265 12.030984 12.017371 11.674179 11.733450 1 0 1
1 b0114 11.767118 11.581649 11.885833 11.571045 11.701411 1 0 1
2 b0115 11.020531 10.813589 11.144573 11.362138 11.085208 1 0 1
3 b0116 10.299068 10.562963 10.788094 10.425459 10.518896 1 0 1
4 b0118 6.116122 6.437554 6.956686 6.229419 6.434945 0 0 0
... ... ... ... ... ... ... ... ... ...
131 b4153 9.647255 10.089110 8.685851 8.902372 9.331147 1 0 1
132 b4154 9.176739 9.387545 8.841818 8.510611 8.979178 1 0 1
133 b4232 9.490210 9.201019 9.321175 9.235511 9.311979 1 0 1
134 b4301 2.335597 2.338372 2.317967 2.270161 2.315524 0 1 -1
135 b4395 2.431005 2.827751 2.822822 2.825172 2.726687 0 1 -1

136 rows × 9 columns

values = df_expr[["gene", "score"]].set_index("gene").to_dict()["score"]
e = evaluate_gpr_expression(G.get_attr_from_edges("GPR"), values)
pd.DataFrame({"id": G.get_attr_from_edges("id"), "score": e})
id score
0 EX_ac_b 0
1 EX_akg_b 0
2 EX_co2_b 0
3 EX_etoh_b 0
4 EX_for_b 0
... ... ...
86 TALA 1
87 THD2 1
88 TKT1 1
89 TKT2 1
90 TPI 1

91 rows × 2 columns

Context-specific network inference#

We will use the reaction scores to create a Data object. The id corresponds to the reaction id of the PKN, and the value to the reaction score. We indicate that these features map to edges (reactions) with mapping = edge. Note that if mapping = none, then it will be assumed that the dataset contains genes, and automatic GPR will be performed to translate from genes to reaction scores.

# Create a dataset (here we only have 1 sample, which is the mean of the four measurements)
from corneto._data import Data

feats = dict()
for edge, val in zip(G.get_attr_from_edges("id"), e):
    feats[edge] = {"value": val, "mapping": "edge"}
data = {
    "sample1": feats
}
data = Data.from_cdict(data)
data
Data(n_samples=1, n_feats=[91])

We run the iMAT method (in this case, only 1 sample, which is the mean of the 4 biological replicates)

# eps is the min absolute flux value that we will consider
# for reactions with positive scores, i.e., if a reaction
# with a positive score is included in the context specific
# metabolic network, it has to be able to carry an absolute
# metabolic flux value above eps.
eps = 1e-2

# We will also add a small penalty to make networks sparser.
# When we have more than 1 sample, this parameter controls 
# the regularization across samples, blocking entire groups
# of reactions across samples. Here, we are just penalizing
# the inclusion of reactions not needed to fit the reaction
# scores
m = MultiSampleIMAT(lambda_reg=1e-3, eps=eps)
P = m.build(G, data)
P.expr
{'_flow': _flow: Variable((91,), _flow),
 'edge_has_flux': edge_has_flux: Variable((91,), edge_has_flux, boolean=True),
 '_flow_ipos': _flow_ipos: Variable((91,), _flow_ipos, boolean=True),
 '_flow_ineg': _flow_ineg: Variable((91,), _flow_ineg, boolean=True),
 'flow': _flow: Variable((91,), _flow)}
P.solve(solver="scipy", verbosity=0);

The final result indicates that 1 reaction with positive score was not selected, and 1 reaction with negative score was included. In total, the network contains 61 reactions with fluxes. The last value corresponds to the number of reactions that were not blocked.

for o in P.objectives:
    print(o.value)
1.0
1.0
60.0
# Metabolic fluxes correspond to values of the flow variable
# Note that there are 91 values, corresponding to the 91 reactions
# in the PKN

df_sol = pd.DataFrame(
    P.expr.flow.value, index=G.get_attr_from_edges("id"), columns=["flux"]
)
df_sol
flux
EX_ac_b 3.935488e+00
EX_akg_b 3.469447e-18
EX_co2_b 0.000000e+00
EX_etoh_b 4.095543e+00
EX_for_b 8.299186e+00
... ...
TALA 1.000000e-02
THD2 8.404543e-01
TKT1 1.000000e-02
TKT2 -1.000000e-02
TPI 4.391463e+00

91 rows × 1 columns

df_sol.hist(bins=30)
array([[<Axes: title={'center': 'flux'}>]], dtype=object)
../_images/d9585fb7b00a8f35d203ce8301fa8668ce70f93af24a0809a55cfeff8f4a3b08.png
selected_reactions = df_sol[df_sol.flux.abs() >= eps*(1-eps)].index.values.tolist()
len(selected_reactions)
60
np.sum(P.expr.edge_has_flux.value)
np.float64(60.0)
# Extract the context specific network
G_ctx = G.edge_subgraph(np.flatnonzero(P.expr.edge_has_flux.value))
G_ctx.shape
(57, 60)
G_ctx.plot(layout="neato")
../_images/280699531c43bc2d5d7b01c049923922966e791947d6731df5fbb81da0b54533.svg

Old version#

Original experiments in the manuscript used the old multicondition_imat:

from corneto.methods.metabolism.fba import multicondition_imat

P = multicondition_imat(G, np.array(e), alpha=1e-3, eps=eps)
P.solve(solver="scipy");
for o in P.objectives:
    print(o.value)
2.0
60.0

Citations#

MultiSampleIMAT.show_bibtex()
@article{shlomi2008network, title={Network-based prediction of human tissue-specific metabolism}, author={Shlomi, Tomer and Cabili, Moran N and Herrg{\aa}rd, Markus J and Palsson, Bernhard {\O} and Ruppin, Eytan}, journal={Nature biotechnology}, volume={26}, number={9}, pages={1003--1010}, year={2008}, publisher={Nature Publishing Group US New York} } @article{rodriguez2024unified, title={Unified knowledge-driven network inference from omics data}, author={Rodriguez-Mier, Pablo and Garrido-Rodriguez, Martin and Gabor, Attila and Saez-Rodriguez, Julio}, journal={bioRxiv}, pages={10}, year={2024}, publisher={Cold Spring Harbor Laboratory} }