Introduction
The goal of this tutorial is to show an example where dynamic logic model is used to predict drug response.
The tutorial is based on the paper
Eduati et al (2017) Drug resistance mechanisms in colorectal cancer dissected with cell type-specific dynamic logic models. Cancer Research. DOI: 10.1158/0008-5472.CAN-17-0078
On Github: https://github.com/saezlab/CRC-pathway-biomarkers
We investigate here the drug-response of colorectal cancer cell lines. For this, we use drug response data and a signaling dataset.
The Genomics of Drug Sensitivity in Cancer (GDSC), https://www.cancerrxgene.org/ offers drug response data for more than a 1000 human cancer cell lines, for hundreds of drugs. A small part of these data is can be found in "./data/IC50_GDSC.csv".
The perturbation dataset contains the short time signaling response of 14 colorectal cancer cell lines, where 14 phosphoproteins are measured under 43 perturbation conditions (5 stimuli, 7 inhibitors).
First, we construct signaling models based on the perturbation data to the cell lines, here we use the CNORode modelling package. In the next step, we will associate model features to drug response to see why certain cell lines respond to certain drugs and others do not. Here we use a linear modeling framework.
CNORode
CNORode is a member of the CellNOptR logic based tool family. It can translate the network to ordinary differential equation (ODE) model, fit the model parameters to data and make predictions.
Dependencies
# To install the packages from Bioconductor you will need the package BiocManager
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager", repos = "https://ftp.gwdg.de/pub/misc/cran/")
}
# installs CellNOptR, CNORode and MEIGOR from Bioconductor
if (!requireNamespace("CellNOptR", quietly = TRUE)) {
BiocManager::install("CellNOptR")
}
if (!requireNamespace("CNORode", quietly = TRUE)) {
BiocManager::install("CNORode")
}
if (!requireNamespace("MEIGOR", quietly = TRUE)) {
BiocManager::install("MEIGOR")
}
# installs data manipulation packages from the tidyverse
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr", repos = "https://ftp.gwdg.de/pub/misc/cran/")
}
if (!requireNamespace("readr", quietly = TRUE)) {
install.packages("readr", repos = "https://ftp.gwdg.de/pub/misc/cran/")
}
if (!requireNamespace("tidyr", quietly = TRUE)) {
install.packages("tidyr", repos = "https://ftp.gwdg.de/pub/misc/cran/")
}
if (!requireNamespace("ggplot2", quietly = TRUE)) {
install.packages("ggplot2", repos = "https://ftp.gwdg.de/pub/misc/cran/")
}
Make sure to import the libraries
library(CellNOptR)
library(CNORode)
library(MEIGOR)
library(dplyr)
library(tidyr)
library(readr)
library(ggplot2)
PART I: DRUG response exploration
IC50 <- read_csv("./data/tutorial_3/IC50_GDSC.csv") %>%
rename("cell_line" = "...1")
print(IC50)
## # A tibble: 14 × 31
## cell_line Gefitinib RDEA119 `CI-1040` Afatinib `Nutlin-3a` `PD-0325901`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CAR1 1.36 1.34 3.27 -0.831 5.51 -0.235
## 2 CCK81 -0.659 NA 2.74 -1.61 0.551 -1.72
## 3 COLO320HSR 1.46 3.78 4.15 1.22 4.26 0.658
## 4 DIFI -3.26 0.00908 1.61 -5.31 4.44 -1.97
## 5 HCT116 1.03 -0.510 1.46 -0.0704 1.14 -1.56
## 6 HT115 0.444 0.946 2.96 -0.640 4.55 -1.41
## 7 HT29 1.91 -2.24 0.230 0.849 4.68 -3.74
## 8 SKCO1 NA -1.49 NA NA 1.97 -4.58
## 9 SNUC2B 0.375 1.55 1.94 0.113 3.45 -1.44
## 10 SNUC5 2.13 0.246 1.79 0.739 5.01 -3.26
## 11 SW1116 NA NA NA NA NA NA
## 12 SW1463 NA NA NA NA NA NA
## 13 SW620 1.51 -0.802 -0.104 1.36 4.28 -3.73
## 14 SW837 NA NA NA NA NA NA
## # … with 24 more variables: SB590885 <dbl>, AZD6244 <dbl>, BMS-536924 <dbl>,
## # Cetuximab <dbl>, HG-5-88-01 <dbl>, (5Z)-7-Oxozeaenol <dbl>,
## # Trametinib <dbl>, Dabrafenib <dbl>, Afatinib (rescreen) <dbl>,
## # AZD6244.1 <dbl>, RDEA119 (rescreen) <dbl>, BMS-754807 <dbl>, OSI-906 <dbl>,
## # BMS-345541 <dbl>, Ruxolitinib <dbl>, LY317615 <dbl>, XMD14-99 <dbl>,
## # CP724714 <dbl>, CH5424802 <dbl>, EKB-569 <dbl>, OSI-930 <dbl>,
## # QL-XI-92 <dbl>, EX-527 <dbl>, THZ-2-102-1 <dbl>
IC50 %>%
pivot_longer(names_to = "drug_name", values_to = "log_IC50", -cell_line) %>%
ggplot() +
geom_point(aes(drug_name, log_IC50, col = cell_line)) +
coord_flip() +
theme_bw() +
ggtitle("Drug response of 14 CRC cell lines")
## Warning: Removed 49 rows containing missing values (geom_point).
Form the raw IC50 values we can see that there are some drugs that are more effective (Trametinib) than others, like XMD14-99. There are also cell-line differences, for example, DIFI shows stronger sensitivity to Afatinib than any other cell lines. What could be the reason for this?
PART II: cell-line models
The goal of part II is to build a cell-line specific model from the perturbation data using CNORode.
This model is an ordinary differential equation (ODE) model, where the equation for each state (\(x_A\)) can be written as \[\frac{dx_A}{dt} = \tau_A(B(f_1(x),f_2(x),...)-x_A) \] here
\(f_i(x)\) represents the incoming edges on node \(A\) with a transfer function. This transfer function typically has an S-shape. \[ f(x) = \frac{x^n}{x^n+k^n}\]
\(B\) is a Boolean homologue function. This is responsible to combine the incoming edges with the OR and AND gates. For example, an OR gate is represented by \(x_1\cdot x_2\).
\(\tau\) is a time parameter, that tells how fast node A adapts to the input.
the model has free parameters: a \(\tau\) for each node, and (\(k\),\(n\)) for each edge. These are found by optimisation.
The main differences are that in ODE models the states are continuous values, therefore it is quantitative, not only qualitative like a Boolean model. Further, here we haVe to find the specific edge and node parameters.
What do we need to build and simulate a differential equation model?
- the equations are derived from the network graph
- inputs: given in the MIDAS description
- Initial conditions for each state in each experiment
In this example, the baseline is set to 0.5. A value of 1 means full activation and 0 means full inhibition of the node.
Perturbation data
The following heatmap shows an overview on the perturbation data. The first block outlines the combinations of treatment. Then each other block represents the response of a cell line. Different columns within a block shows the different phosphoprotein markers.
In the tutorial we make a single model for the first cell line COLO320HSR.
Model a single cell line
Similarly to the previous tutorial with CellNopt, here we also start by importing a prior knowledge network and the perturbation data in MIDAS format.
# load Prior Knowledge Network (PKN)
pknmodel <- readSIF("./data/tutorial_3/PKN.sif")
# load normalised perturbation data
# select MIDAS file for the desired cell line
MIDASfile <- "./data/tutorial_3/processed/MIDAS/MD-COLO320HSR_Ktuned_v1_n4_all_noEGFRi_CNORode.csv"
Mydata <- readMIDAS(MIDASfile = MIDASfile, verbose = FALSE)
cnolist <- makeCNOlist(Mydata, subfield = FALSE)
## [1] "Please be aware that if you only have some conditions at time zero (e.g.only inhibitor/no inhibitor), the measurements for these conditions will be copied across matching conditions at t=0"
cnolist$valueStimuli[cnolist$valueStimuli == 0] <- 0.5
Show the network first
plotModel(pknmodel, cnolist)
As in CellNOPt:
- green nodes are stimulated in some experiments
- blue nodes are measured
- white nodes are modelled, but not measured
- red nodes or red bordered nodes are occasionally inhibited
- black edges represents activation, red T-shaped arrows represents inhibition
Then the data in CellNOpt format:
plotCNOlist(cnolist)
The data is very large, therefore is hard to see the details, but we can notice as some nodes increases their activity at the final time (30 mins).
# compress the network (no expansion, only OR gates are considered)
model <- preprocessing(
data = cnolist, model = pknmodel,
compression = TRUE, expansion = FALSE
)
## [1] "The following species are measured: AKT, EGFR, ERK, GSK3, IkBa, JNK, MEK, PI3K, RPS6, RSKp90, SMAD2, cJun, mTOR, p38, TAK1, TGFRb, IRS1, M3K1, MP2K4, S6K, TBK1, IKK, RASK, BRAF, PAK1"
## [1] "The following species are stimulated: PLX, EGF, HGF, IGF1, TGFb, TNFa"
## [1] "The following species are inhibited: IKK, MEK, PI3K, BRAF, TAK1, TBK1, mTOR"
## [1] "The following species are not observable and/or not controllable: "
# set initial parameters
# (here parameters 'k' and 'tau' are optimised and 'n' fixed to 3)
ode_parameters <- createLBodeContPars(model,
LB_n = 1, LB_k = 0, LB_tau = 0,
UB_n = 3, UB_k = 1, UB_tau = 1,
default_n = 3,
default_k = 0.5,
default_tau = 0.01,
opt_n = FALSE, opt_k = TRUE, opt_tau = TRUE,
random = TRUE
)
# PLX -> BRAF is an artificial regulation used to model paradoxical effect of
# PLX4720, which works as selective BRAF inhibitor in cell-lines where BRAF is
# mutated in V600E (i.e. HT29 and SNUC5 in our panel), but induces a paradoxical
# activation of wild type BRAF cells (modeled as stimulus on those cell lines)
ode_parameters$parValues[which(ode_parameters$parNames == "PLX_k_BRAF")] <- 0.5
ode_parameters$index_opt_pars <- setdiff(
ode_parameters$index_opt_pars,
which(ode_parameters$parNames == "PLX_k_BRAF")
)
## Parameter Optimization
# essm
paramsSSm <- defaultParametersSSm()
paramsSSm$local_solver <- "DHC"
paramsSSm$maxtime <- 30
# 36000;
paramsSSm$transfer_function <- 4
The actual optimisation takes around 10 mins, instead of the 30 sec. So, instead of running it here, we just load the results:
opt_pars <- parEstimationLBode(cnolist, model,
method = "essm",
ode_parameters = ode_parameters,
paramsSSm = paramsSSm
)
# write_rds(opt_pars,"data/tutorial_3/opt_pars_30sec.RDS")
opt_pars <- read_rds("data/tutorial_3/opt_pars_30sec.RDS")
Plot the fit of the model:
sim_res <- CNORode::plotLBodeFitness(cnolist, model, ode_parameters = opt_pars)
The fit is a bit better than a random model, but these optimisations should be run for around 10 hours.
We are interested in the optimised model parameters of this model.
opt_par_values <- opt_pars$parValues
names(opt_par_values) <- opt_pars$parNames
opt_par_values
## TGFRb_n_TAK1 TGFRb_k_TAK1 TNFa_n_TAK1 TNFa_k_TAK1 tau_TAK1
## 3.000000e+00 6.290662e-01 3.000000e+00 0.000000e+00 7.787088e-03
## TGFb_n_TGFRb TGFb_k_TGFRb EGFR_n_TGFRb EGFR_k_TGFRb tau_TGFRb
## 3.000000e+00 2.802733e-01 3.000000e+00 1.542701e-01 0.000000e+00
## EGF_n_EGFR EGF_k_EGFR ERK_n_EGFR ERK_k_EGFR tau_EGFR
## 3.000000e+00 0.000000e+00 3.000000e+00 2.007754e-07 7.015971e-03
## S6K_n_IRS1 S6K_k_IRS1 TBK1_n_IRS1 TBK1_k_IRS1 mTOR_n_IRS1
## 3.000000e+00 0.000000e+00 3.000000e+00 0.000000e+00 3.000000e+00
## mTOR_k_IRS1 IGF1_n_IRS1 IGF1_k_IRS1 MEK_n_IRS1 MEK_k_IRS1
## 0.000000e+00 3.000000e+00 0.000000e+00 3.000000e+00 4.060043e-07
## tau_IRS1 PI3K_n_AKT PI3K_k_AKT tau_AKT PI3K_n_M3K1
## 0.000000e+00 3.000000e+00 4.100319e-02 0.000000e+00 3.000000e+00
## PI3K_k_M3K1 RASK_n_M3K1 RASK_k_M3K1 tau_M3K1 ERK_n_RPS6
## 4.432509e-01 3.000000e+00 5.075054e-08 1.568557e-09 3.000000e+00
## ERK_k_RPS6 S6K_n_RPS6 S6K_k_RPS6 RSKp90_n_RPS6 RSKp90_k_RPS6
## 2.498247e-01 3.000000e+00 0.000000e+00 3.000000e+00 8.162135e-01
## tau_RPS6 IKK_n_ERK IKK_k_ERK MEK_n_ERK MEK_k_ERK
## 1.027507e-02 3.000000e+00 3.065270e-01 3.000000e+00 1.271568e-01
## tau_ERK TAK1_n_MP2K4 TAK1_k_MP2K4 M3K1_n_MP2K4 M3K1_k_MP2K4
## 9.911941e-03 3.000000e+00 1.000000e+00 3.000000e+00 0.000000e+00
## TNFa_n_MP2K4 TNFa_k_MP2K4 tau_MP2K4 mTOR_n_S6K mTOR_k_S6K
## 3.000000e+00 0.000000e+00 0.000000e+00 3.000000e+00 5.503381e-01
## PI3K_n_S6K PI3K_k_S6K tau_S6K IKK_n_TBK1 IKK_k_TBK1
## 3.000000e+00 0.000000e+00 1.462364e-05 3.000000e+00 6.452658e-01
## tau_TBK1 AKT_n_mTOR AKT_k_mTOR tau_mTOR TAK1_n_IKK
## 9.931725e-03 3.000000e+00 0.000000e+00 0.000000e+00 3.000000e+00
## TAK1_k_IKK AKT_n_IKK AKT_k_IKK M3K1_n_IKK M3K1_k_IKK
## 8.246211e-01 3.000000e+00 0.000000e+00 3.000000e+00 0.000000e+00
## TNFa_n_IKK TNFa_k_IKK TBK1_n_IKK TBK1_k_IKK tau_IKK
## 3.000000e+00 0.000000e+00 3.000000e+00 8.707054e-05 0.000000e+00
## EGFR_n_PI3K EGFR_k_PI3K IRS1_n_PI3K IRS1_k_PI3K RPS6_n_PI3K
## 3.000000e+00 1.599331e-01 3.000000e+00 0.000000e+00 3.000000e+00
## RPS6_k_PI3K TNFa_n_PI3K TNFa_k_PI3K IGF1_n_PI3K IGF1_k_PI3K
## 1.000000e+00 3.000000e+00 0.000000e+00 3.000000e+00 0.000000e+00
## HGF_n_PI3K HGF_k_PI3K RASK_n_PI3K RASK_k_PI3K tau_PI3K
## 3.000000e+00 4.751120e-01 3.000000e+00 3.009077e-02 3.042961e-02
## EGFR_n_RASK EGFR_k_RASK IGF1_n_RASK IGF1_k_RASK HGF_n_RASK
## 3.000000e+00 0.000000e+00 3.000000e+00 0.000000e+00 3.000000e+00
## HGF_k_RASK tau_RASK BRAF_n_MEK BRAF_k_MEK PAK1_n_MEK
## 5.156353e-07 0.000000e+00 3.000000e+00 7.505019e-03 3.000000e+00
## PAK1_k_MEK tau_MEK ERK_n_BRAF ERK_k_BRAF RASK_n_BRAF
## 0.000000e+00 0.000000e+00 3.000000e+00 5.244096e-02 3.000000e+00
## RASK_k_BRAF PAK1_n_BRAF PAK1_k_BRAF PLX_n_BRAF PLX_k_BRAF
## 0.000000e+00 3.000000e+00 9.269049e-02 3.000000e+00 3.603077e-06
## tau_BRAF ERK_n_RSKp90 ERK_k_RSKp90 tau_RSKp90 TAK1_n_JNK
## 0.000000e+00 3.000000e+00 9.381897e-01 1.471951e-02 3.000000e+00
## TAK1_k_JNK IRS1_n_JNK IRS1_k_JNK M3K1_n_JNK M3K1_k_JNK
## 0.000000e+00 3.000000e+00 3.617738e-05 3.000000e+00 0.000000e+00
## TNFa_n_JNK TNFa_k_JNK MP2K4_n_JNK MP2K4_k_JNK tau_JNK
## 3.000000e+00 8.352255e-02 3.000000e+00 0.000000e+00 0.000000e+00
## AKT_n_PAK1 AKT_k_PAK1 PI3K_n_PAK1 PI3K_k_PAK1 tau_PAK1
## 3.000000e+00 5.496838e-01 3.000000e+00 0.000000e+00 0.000000e+00
## TAK1_n_p38 TAK1_k_p38 MP2K4_n_p38 MP2K4_k_p38 TBK1_n_p38
## 3.000000e+00 2.582269e-01 3.000000e+00 0.000000e+00 3.000000e+00
## TBK1_k_p38 tau_p38 TGFRb_n_SMAD2 TGFRb_k_SMAD2 tau_SMAD2
## 8.704423e-01 1.299676e-02 3.000000e+00 0.000000e+00 0.000000e+00
## AKT_n_GSK3 AKT_k_GSK3 RSKp90_n_GSK3 RSKp90_k_GSK3 tau_GSK3
## 3.000000e+00 0.000000e+00 3.000000e+00 0.000000e+00 0.000000e+00
## TAK1_n_IkBa TAK1_k_IkBa IKK_n_IkBa IKK_k_IkBa tau_IkBa
## 3.000000e+00 4.080694e-01 3.000000e+00 7.350341e-01 1.539245e-02
## JNK_n_cJun JNK_k_cJun tau_cJun
## 3.000000e+00 0.000000e+00 0.000000e+00
Similar to the above cell-line, we can build a model for each of the cell lines. This is very time consuming, therefore we just load the optimised parameters from the paper.
optimised_parameters <- read_delim("./data/tutorial_3/allModelsParameters.txt",
delim = "\t"
)
Let’s check edge and node parameters.
edge_id <- grep("_k_", colnames(optimised_parameters))
edge_parameters_HM <- optimised_parameters[edge_id] %>% as.matrix()
rownames(edge_parameters_HM) <- optimised_parameters$cell_line
# heatmap(edge_parameters_HM, main = "Cell line edge parameters")
node_id <- grep("tau_", colnames(optimised_parameters))
node_parameters_HM <- optimised_parameters[node_id] %>% as.matrix()
rownames(node_parameters_HM) <- optimised_parameters$cell_line
# heatmap(node_parameters_HM, main = "Cell line node parameters")
The level of edge and node parameters differs across the cell-lines.
PART III: Associate model parameters and drug response
Which model parameters correlates with drug response IC50 ?
# first we need to remove the parameters, that are zero across all the models
zero_pars <- names(which(colMeans(optimised_parameters[, -1]) == 0))
# join the IC50 data and network model parameters based on cell_lines
drug_model_data <- optimised_parameters %>%
select(-zero_pars) %>%
gather(parameter, par_value, -cell_line) %>%
left_join(IC50 %>% gather(drug, IC50, -cell_line), by = "cell_line")
# for each drug and each parameter compute the correlation coefficient
corr_data <- drug_model_data %>%
group_by(drug, parameter) %>%
summarise(corr_par_drug = cor(par_value, IC50, use = "complete.obs"))
Let’s show some correlation
corr_data %>%
arrange(desc(abs(corr_par_drug))) %>%
print(., n = 25)
## # A tibble: 2,070 × 3
## # Groups: drug [30]
## drug parameter corr_par_drug
## <chr> <chr> <dbl>
## 1 Afatinib tau_TBK1 -0.896
## 2 Afatinib tau_ERK -0.875
## 3 BMS-754807 tau_RPS6 0.861
## 4 Gefitinib tau_ERK -0.859
## 5 OSI-906 tau_RPS6 0.846
## 6 Afatinib PAK1_k_MEK 0.834
## 7 RDEA119 IKK_k_ERK 0.818
## 8 Gefitinib tau_TBK1 -0.813
## 9 SB590885 tau_IkBa -0.799
## 10 Afatinib (rescreen) tau_TBK1 -0.784
## 11 CI-1040 PI3K_k_S6K 0.777
## 12 CI-1040 M3K1_k_IKK -0.776
## 13 Afatinib (rescreen) tau_ERK -0.762
## 14 HG-5-88-01 ERK_k_BRAF 0.756
## 15 HG-5-88-01 ERK_k_RPS6 -0.754
## 16 PD-0325901 AKT_k_GSK3 0.732
## 17 RDEA119 PI3K_k_S6K 0.731
## 18 PD-0325901 PI3K_k_S6K 0.721
## 19 SB590885 RSKp90_k_GSK3 -0.717
## 20 (5Z)-7-Oxozeaenol PI3K_k_S6K 0.717
## 21 RDEA119 MP2K4_k_JNK 0.712
## 22 OSI-906 ERK_k_BRAF 0.708
## 23 Gefitinib tau_MEK 0.708
## 24 (5Z)-7-Oxozeaenol MP2K4_k_JNK 0.707
## 25 THZ-2-102-1 tau_SMAD2 0.706
## # … with 2,045 more rows
drug_model_data %>%
filter(drug == "Afatinib", parameter == "tau_TBK1") %>%
ggplot() +
geom_point(aes(par_value, IC50)) +
ggtitle("DRUG: Afatinib; parameter: tau_TBK1")
Think what the problem might be here?
We have only 14 cell-lines, therefore each of the correlations between model parameter and drug IC50 is based on 14 data points. There are 31 drugs and 89 model parameters, which results in 31*89=2759
tests.
Also this is only a single parameter - single drug association. It is possible, that the existence of multiple edges makes a cell-line sensitive/resistant. Therefore (Eduati et al) derived linear models, that finds multiple parameters at the same time.