CARNIVAL#
CARNIVAL (CAusal Reasoning for Network identification using Integer VALue programming) is a method for the identification of upstream reguatory signalling pathways from downstream gene expression (GEX). Applications of CARNIVAL include the identification of drug’s modes of action and of deregulated processes in diseases (even if the molecular targets remain unknown) by deciphering the alterations of main signalling pathways as well as alternative pathways and off-target effects.
![CARNIVAL abstract](/_static/carnival-abstract.png)
Figure 1: Liu A., Trairatphisan P., Gjerga E. et al. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL npj Systems Biology and Applications volume 5, Article number: 40 (2019) (equal contributions).
The aim of the CARNIVAL pipeline is to identify a subset of interactions from a prior knowledge network that represent potential regulated pathways linking known or potential targets of perturbation towards active transcription factors derived from GEX data. The pipeline includes a number improved functionalities comparing to the original version and consists of the following processes:
Transcription factors’ (TFs) activities and pathway scores from gene expressions can be inferred with our in-house tools (Dorothea, CollecTRI).
TFs’ activities and signed directed protein-protein interaction networks with or without the provided target of perturbations and pathway scores are then used to construct an optimization problem with CORNETO.
CORNETO is used to solve the optimization problem with any of the supported solvers (CPLEX, GUROBI, SCIPY, etc), which identifies the sub-network topology with minimised fitting error and model size.
The original version of CARNIVAL was implemented in R and CPLEX. The new re-implementationo of CARNIVAL in CORNETO support a wide variety of solvers thanks to the support of both CVXPY and PICOS. It also has more flexibility since the problem is symbolically defined, and can be modified through the CORNETO API after creating the CARNIVAL problem. This gives user extra flexibility to modify the problem or to use CORNETO as a building block for other optimization problems.
import corneto as cn
cn.info()
|
|
A quick example#
G = cn.Graph.from_sif_tuples(
[
("I1", 1, "N1"), # I1 activates N1
("N1", 1, "M1"), # N1 activates M1
("N1", 1, "M2"), # N1 activaes M2
("I2", -1, "N2"), # I2 inhibits N2
("N2", -1, "M2"), # N2 inhibits M2
("N2", -1, "M1"), # N2 inhibits M1
]
)
G.plot()
from corneto.methods import runVanillaCarnival
# These are the measurements (e.g. TF activity from Decoupler).
# Positive values correspond to up-regulation and negative values
# with down-regulation. The bigger the absolute value is,
# the bigger the importance is
measurements = {"M1": 1, "M2": 1}
# Perturbations are the upstream nodes were the signal originates on,
# for example, ligands or receptors.
perturbations = {"I1": 1, "I2": 1}
# We run the `standard` carnival problem. This interface is similar
# to the old R function https://saezlab.github.io/CARNIVAL/reference/runVanillaCarnival.html
P, Gf = runVanillaCarnival(perturbations, measurements, G, betaWeight=0.1)
2/2 inputs mapped to the graph
2/2 outputs mapped to the graph
Pruning the graph with size: V x E = (6, 6)...
Finished. Final size: V x E = (6, 6).
2/2 inputs after pruning.
2/2 outputs after pruning.
Converting into a flow graph...
Creating a network flow problem...
Preprocess completed.
===============================================================================
CVXPY
v1.5.3
===============================================================================
(CVXPY) Jan 20 10:02:20 AM: Your problem has 86 variables, 197 constraints, and 0 parameters.
(CVXPY) Jan 20 10:02:20 AM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Jan 20 10:02:20 AM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Jan 20 10:02:20 AM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Jan 20 10:02:20 AM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
Compilation
-------------------------------------------------------------------------------
(CVXPY) Jan 20 10:02:20 AM: Compiling problem (target solver=SCIP).
(CVXPY) Jan 20 10:02:20 AM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> SCIP
(CVXPY) Jan 20 10:02:20 AM: Applying reduction Dcp2Cone
(CVXPY) Jan 20 10:02:20 AM: Applying reduction CvxAttr2Constr
(CVXPY) Jan 20 10:02:20 AM: Applying reduction ConeMatrixStuffing
(CVXPY) Jan 20 10:02:20 AM: Applying reduction SCIP
(CVXPY) Jan 20 10:02:20 AM: Finished problem compilation (took 2.890e-02 seconds).
-------------------------------------------------------------------------------
Numerical solver
-------------------------------------------------------------------------------
(CVXPY) Jan 20 10:02:20 AM: Invoking solver SCIP to obtain a solution.
presolving:
(round 1, fast) 36 del vars, 95 del conss, 0 add conss, 48 chg bounds, 1 chg sides, 23 chg coeffs, 0 upgd conss, 0 impls, 28 clqs
(round 2, fast) 54 del vars, 131 del conss, 0 add conss, 53 chg bounds, 10 chg sides, 32 chg coeffs, 0 upgd conss, 0 impls, 20 clqs
(round 3, fast) 61 del vars, 145 del conss, 0 add conss, 53 chg bounds, 12 chg sides, 34 chg coeffs, 0 upgd conss, 0 impls, 18 clqs
(round 4, fast) 64 del vars, 147 del conss, 0 add conss, 53 chg bounds, 16 chg sides, 38 chg coeffs, 0 upgd conss, 0 impls, 18 clqs
(round 5, fast) 66 del vars, 151 del conss, 0 add conss, 53 chg bounds, 16 chg sides, 38 chg coeffs, 0 upgd conss, 0 impls, 18 clqs
(round 6, exhaustive) 66 del vars, 151 del conss, 0 add conss, 53 chg bounds, 16 chg sides, 38 chg coeffs, 40 upgd conss, 0 impls, 18 clqs
(0.0s) probing cycle finished: starting next cycle
(round 7, exhaustive) 68 del vars, 151 del conss, 0 add conss, 53 chg bounds, 16 chg sides, 38 chg coeffs, 40 upgd conss, 28 impls, 46 clqs
(round 8, medium) 68 del vars, 155 del conss, 0 add conss, 53 chg bounds, 16 chg sides, 38 chg coeffs, 40 upgd conss, 28 impls, 46 clqs
(0.0s) probing cycle finished: starting next cycle
(0.0s) symmetry computation started: requiring (bin +, int +, cont +), (fixed: bin -, int -, cont -)
(0.0s) symmetry computation finished: 2 generators found (max: 1500, log10 of symmetry group size: 0.0) (symcode time: 0.00)
dynamic symmetry handling statistics:
orbitopal reduction: no components
orbital reduction: no components
lexicographic reduction: no permutations
handled 1 out of 1 symmetry components
(round 9, exhaustive) 68 del vars, 155 del conss, 3 add conss, 53 chg bounds, 16 chg sides, 38 chg coeffs, 40 upgd conss, 28 impls, 46 clqs
presolving (10 rounds: 10 fast, 5 medium, 4 exhaustive):
68 deleted vars, 155 deleted constraints, 3 added constraints, 53 tightened bounds, 0 added holes, 16 changed sides, 38 changed coefficients
28 implications, 46 cliques
presolved problem has 27 variables (19 bin, 0 int, 0 impl, 8 cont) and 45 constraints
16 constraints of type <varbound>
18 constraints of type <setppc>
9 constraints of type <linear>
2 constraints of type <logicor>
transformed objective value is always integral (scale: 0.1)
Presolving Time: 0.00
time | node | left |LP iter|LP it/n|mem/heur|mdpt |vars |cons |rows |cuts |sepa|confs|strbr| dualbound | primalbound | gap | compl.
p 0.0s| 1 | 0 | 3 | - | clique| 0 | 27 | 45 | 42 | 0 | 0 | 0 | 0 |-3.500000e+00 | 1.000000e-01 | Inf | unknown
p 0.0s| 1 | 0 | 3 | - | locks| 0 | 27 | 45 | 42 | 0 | 0 | 0 | 0 |-3.500000e+00 |-6.000000e-01 | 483.33%| unknown
p 0.0s| 1 | 0 | 3 | - | vbounds| 0 | 27 | 46 | 42 | 0 | 0 | 1 | 0 |-3.500000e+00 |-1.600000e+00 | 118.75%| unknown
p 0.0s| 1 | 0 | 3 | - | vbounds| 0 | 27 | 48 | 42 | 0 | 0 | 3 | 0 |-3.500000e+00 |-3.200000e+00 | 9.37%| unknown
0.0s| 1 | 0 | 26 | - | 1784k | 0 | 27 | 49 | 42 | 0 | 0 | 8 | 0 |-3.300000e+00 |-3.200000e+00 | 3.13%| unknown
r 0.0s| 1 | 0 | 26 | - |rounding| 0 | 27 | 49 | 39 | 0 | 0 | 8 | 0 |-3.300000e+00 |-3.300000e+00 | 0.00%| unknown
0.0s| 1 | 0 | 26 | - | 1784k | 0 | 27 | 49 | 39 | 0 | 0 | 8 | 0 |-3.300000e+00 |-3.300000e+00 | 0.00%| unknown
0.0s| 1 | 0 | 26 | - | 1784k | 0 | 27 | 49 | 39 | 0 | 0 | 8 | 0 |-3.300000e+00 |-3.300000e+00 | 0.00%| unknown
SCIP Status : problem is solved [optimal solution found]
Solving Time (sec) : 0.01
Solving Nodes : 1
Primal Bound : -3.30000000000000e+00 (5 solutions)
Dual Bound : -3.30000000000000e+00
Gap : 0.00 %
-------------------------------------------------------------------------------
Summary
-------------------------------------------------------------------------------
(CVXPY) Jan 20 10:02:20 AM: Problem status: optimal
(CVXPY) Jan 20 10:02:20 AM: Optimal value: 7.000e-01
(CVXPY) Jan 20 10:02:20 AM: Compilation took 2.890e-02 seconds
(CVXPY) Jan 20 10:02:20 AM: Solver (including time spent in interface) took 1.934e-02 seconds
Finished in 0.06 s.
The carnival problem has two objectives P.objectives
:
P.objectives[0]
: the error of fitting the measurements (Transcription Factors)P.objectives[1]
: the size of the network
# We can check the error. Values larger than 0 indicate that
# some TF or measurement was not selected.
P.objectives[0].value
array([0.])
# Variables that are defined by CARNIVAL
P.expr
{'_flow': Variable((14,), _flow),
'_flow_ipos': Variable((14,), _flow_ipos, boolean=True),
'species_inhibited_c0': Variable((10,), species_inhibited_c0, boolean=True),
'reaction_sends_inhibition_c0': Variable((14,), reaction_sends_inhibition_c0, boolean=True),
'reaction_sends_activation_c0': Variable((14,), reaction_sends_activation_c0, boolean=True),
'species_activated_c0': Variable((10,), species_activated_c0, boolean=True),
'dag_layer_position_c0': Variable((10,), dag_layer_position_c0),
'flow': Variable((14,), _flow),
'edge_values_c0': Expression(AFFINE, UNKNOWN, (14,)),
'vertex_values_c0': Expression(AFFINE, UNKNOWN, (10,))}
Gf.plot_values(vertex_values=P.expr.vertex_values_c0, edge_values=P.expr.edge_values_c0)
G.plot(
custom_edge_attr=cn.pl.edge_style(P, edge_var="edge_values_c0"),
custom_vertex_attr=cn.pl.vertex_style(P, Gf, vertex_var="vertex_values_c0"),
)
import pandas as pd
from corneto.methods.carnival import get_result, get_selected_edges
V, E = get_result(P, Gf)
pd.DataFrame(V)
V | value | |
---|---|---|
0 | I2 | 1.0 |
1 | I1 | 1.0 |
2 | N2 | -1.0 |
3 | M1 | 1.0 |
4 | N1 | 0.0 |
5 | M2 | 1.0 |
6 | _s | 1.0 |
7 | _pert_c0 | 1.0 |
8 | _meas_c0 | 0.0 |
9 | _t | 0.0 |
pd.DataFrame(E)
E | value | |
---|---|---|
0 | ((I1), (N1)) | 0.0 |
1 | ((N1), (M1)) | 0.0 |
2 | ((N1), (M2)) | 0.0 |
3 | ((I2), (N2)) | -1.0 |
4 | ((N2), (M2)) | 1.0 |
5 | ((N2), (M1)) | 1.0 |
6 | ((_s), (_pert_c0)) | 1.0 |
7 | ((_pert_c0), (I1)) | 1.0 |
8 | ((_pert_c0), (I2)) | 1.0 |
9 | ((M1), (_meas_c0)) | 1.0 |
10 | ((M2), (_meas_c0)) | 1.0 |
11 | ((_meas_c0), (_t)) | 0.0 |
12 | ((), (_s)) | 1.0 |
13 | ((_t), ()) | 0.0 |
G_sol = Gf.edge_subgraph(get_selected_edges(P, Gf))
G_sol.plot()
Fast carnival#
For larger problems, and when a fast solver is not available, we provide a heuristic version of CARNIVAL that uses a greedy algorithm to solve the problem faster. However, solutions are not optimal and the final size of the networks cannot be controlled, but provides a good approximation of the optimal solution.
from corneto.methods import fast_carnival
from corneto.methods.carnival import read_dataset
G, dict_data = read_dataset("carnival_example.zip")
G.shape
(4931, 13157)
dict_data
{'EGFR': ('P', -1),
'AEBP1': ('M', -1),
'CREB3': ('M', 1),
'EGR3': ('M', -1),
'ELK1': ('M', -1),
'FOSL2': ('M', -1),
'FOXA3': ('M', 1),
'FOXE1': ('M', 1),
'HIPK2': ('M', 1),
'HOXA2': ('M', 1),
'HOXB7': ('M', -1),
'HOXD3': ('M', 1),
'KDM5C': ('M', -1),
'KDM5D': ('M', 1),
'KLF2': ('M', -1),
'NEUROD1': ('M', -1),
'NFATC4': ('M', -1),
'NFYB': ('M', -1),
'NHLH2': ('M', 1),
'PLAGL2': ('M', -1),
'RUNX2': ('M', -1),
'SIRT1': ('M', 1),
'SUPT20H': ('M', 1),
'TBX20': ('M', -1),
'TET1': ('M', 1),
'ZBTB4': ('M', -1)}
c_inputs = {k: v[1] for k, v in dict_data.items() if v[0] == "P"}
c_outputs = {k: v[1] for k, v in dict_data.items() if v[0] == "M"}
Gc, selected_edges, paths, stats, errors = fast_carnival(
G, c_inputs, c_outputs, restricted_search=False
)
1/1 inputs mapped to the graph
16/25 outputs mapped to the graph
Pruning the graph with size: V x E = (4931, 13157)...
Finished. Final size: V x E = (1182, 5789).
SIRT1 (L3), ELK1 (L3), NFATC4 (L3), NEUROD1 (L4), HIPK2 (L3), ZBTB4 (L4), NFYB (L3), AEBP1 (L3), RUNX2 (L3), HOXB7 (L4)
L1 : 1 iters, 0.00 s.
L2 : 41 iters, 0.00 s.
> -EGFR -> +CRK -> +MAPK8 -> +SIRT1
> -EGFR -> +CRK -> +MAPK8 -> -NFATC4
> -EGFR -> -PTPN1 -> +ABL1 -> +HIPK2
> -EGFR -> -PRKDC -> -MAPK8 -> -ELK1
! conflict: -EGFR/-MAPK8 != -EGFR/+MAPK8
> -EGFR -> -MAP2K1 -> -MAPK1 -> -ELK1
> -EGFR -> -MAP2K1 -> -MAPK1 -> -AEBP1
> -EGFR -> -MAP2K1 -> -MAPK3 -> -RUNX2
L3 : 316 iters, 0.01 s.
> -EGFR -> -STAT1 -> +CREBBP -> ... -> -NFYB
> -EGFR -> -PIK3R1 -> -RAC1 -> ... -> -NEUROD1
> -EGFR -> -PTPN1 -> +ABL1 -> ... -> -ZBTB4
> -EGFR -> -ERBB2 -> +CDK1 -> ... -> -HOXB7
Finished (0.19 s)
> Number of selected edges: 26
> Total iterations: 2287
> Detected loops: 255
> Conflicts: 1
Total error: 15
Number of selected edges: 26
node_values = {}
for p in paths:
for k, v in p[1].items():
node_values[k] = v[1]
G_sol = Gc.edge_subgraph(selected_edges)
V = G_sol.V
vertex_values = [node_values.get(v, 0) for v in V]
errors
{'AEBP1': 0,
'CREB3': 1,
'EGR3': 1,
'ELK1': 0,
'FOSL2': 1,
'FOXA3': 1,
'FOXE1': 1,
'HIPK2': 0,
'HOXA2': 1,
'HOXB7': 0,
'HOXD3': 1,
'KDM5C': 1,
'KDM5D': 1,
'KLF2': 1,
'NEUROD1': 0,
'NFATC4': 0,
'NFYB': 0,
'NHLH2': 1,
'PLAGL2': 1,
'RUNX2': 0,
'SIRT1': 0,
'SUPT20H': 1,
'TBX20': 1,
'TET1': 1,
'ZBTB4': 0}
G_sol.plot_values(vertex_values=vertex_values)