Installation

# install from bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("cosmosR")

# install the development version from GitHub
# install.packages("remotes")
remotes::install_github("saezlab/cosmosR")

Introduction

COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets. COSMOS leverages extensive prior knowledge of signaling pathways, metabolic networks, and gene regulation with computational methods to estimate activities of transcription factors and kinases as well as network-level causal reasoning. This pipeline can provide mechanistic explanations for experimental observations across multiple omic data sets.

data_intro_figure

First, we load the package

Tutorial section: signaling to metabolism

In this part, we can set up the options for the CARNIVAL run, such as timelimit and min gap tolerance.

The user should provide a path to its CPLEX/cbc executable

You can check the CARNIVAL_options variable to see all possible options that can be adjusted

In this example, we will use the built-in solver lpSolve. User should be aware that lpSolve should ONLY be used for TESTS. To obtain meaningful results, best solver is cplex, or cbc if not possible.

CARNIVAL_options <- cosmosR::default_CARNIVAL_options()
# CARNIVAL_options$solverPath <- "~/Documents/cplex"
# CARNIVAL_options$solver <- "cplex" #or cbc
CARNIVAL_options$solver <- "lpSolve" #or cbc
CARNIVAL_options$timelimit <- 3600
CARNIVAL_options$mipGAP <- 0.05
CARNIVAL_options$threads <- 2

In the next section, we prepare the input to run cosmosR The signaling inputs are the result of footprint based TF and kinase activity estiamtion For more info on TF activity estiamtion from transcriptomic data, see:https://github.com/saezlab/transcriptutorial (Especially chapter 4)

Here we use of toy PKN, to see the full meta PKN, you can load it with data(meta_network).

The metabolites in the prior knowledge network are identified as XMetab_PUBCHEMidcompartment or XMetab_BIGGidcompartment or example “XMetab_6804m”. The compartment code is the BIGG model standard (r, c, e, x, m, l, n, g). Thus we will first need to map whatever identifer for metabolite the data has to the one of the network. Genes are identified as XENTREZid (in the signaling part of network) or XGene####__ENTREZid (in the reaction network part of network)

The maximum network depth will define the maximum number of step downstream of kinase/TF COSMOS will look for deregulated metabolites. Good first guess for max depth could be around 6 (here is 15 for the toy dataset)

The differential experession data is used to filter out wrong TF-target interactions in this context after a pre-optimisation.

The list of genes in the differential expression data will also be used as a reference to define which genes are expressed or not (all genes in the diff_expression_data are considered expressed, and genes that are no in diff_expression_data are removed from the network)

data(toy_network)
data(toy_signaling_input)
data(toy_metabolic_input)
data(toy_RNA)
test_for <- preprocess_COSMOS_signaling_to_metabolism(meta_network = toy_network,
                                        signaling_data = toy_signaling_input,
                                        metabolic_data = toy_metabolic_input,
                                                      diff_expression_data = toy_RNA,
                                                      maximum_network_depth = 15,
                                                      remove_unexpressed_nodes = TRUE,
                                                      CARNIVAL_options = CARNIVAL_options
                                                      )
## [1] "COSMOS: all 2 signaling nodes from data were found in the meta PKN"
## [1] "COSMOS: all 3 metabolic nodes from data were found in the meta PKN"
## [1] "COSMOS: 4653 of the 15919 genes in expression data were found as transcription factor target"
## [1] "COSMOS: 4653 of the 5303 transcription factor targets were found in expression data"
## [1] "COSMOS: removing unexpressed nodes from PKN..."
## [1] "COSMOS: 0 interactions removed"
## [1] "COSMOS: removing nodes that are not reachable from inputs within 15 steps"
## [1] "COSMOS: 10 from  19 interactions are removed from the PKN"
## [1] "COSMOS: 1 input/measured nodes are not in PKN any more: XMetab__190___c____ and 0 more."
## [1] "COSMOS: removing nodes that are not observable by measurements within 15 steps"
## [1] "COSMOS: 0 from  9 interactions are removed from the PKN"
## [1] "COSMOS:  1 interactions are removed from the PKN based on consistency check between TF activity and gene expression"
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
## [1] "COSMOS:  0 interactions are removed from the PKN based on consistency check between TF activity and gene expression"
## [1] "COSMOS: all 2 metabolic nodes from data were found in the meta PKN"
## [1] "COSMOS: 4653 of the 15919 genes in expression data were found as transcription factor target"
## [1] "COSMOS: 4653 of the 5303 transcription factor targets were found in expression data"

In this part, we can set up the options for the actual run, such as timelimit and min gap tolerance.

The running time should be much higher here than in pre-optimisation. You cna increase the number of threads to use if you have many available CPUs.

CARNIVAL_options$timelimit <- 14400
CARNIVAL_options$mipGAP <- 0.05
CARNIVAL_options$threads <- 2

This is where cosmosR run.

test_result_for <- run_COSMOS_signaling_to_metabolism(data = test_for,
                                                      CARNIVAL_options = CARNIVAL_options)
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."

Finally, we process the results of the first cosmosR run, to translate gene names and metabolites name.

data(metabolite_to_pubchem)
data(omnipath_ptm)
test_result_for <- format_COSMOS_res(test_result_for,
                                     metab_mapping = metabolite_to_pubchem,
                     measured_nodes = unique(c(names(toy_metabolic_input),
                                               names(toy_signaling_input))),
                                     omnipath_ptm = omnipath_ptm)

Tutorial section: metabolism to signaling

Before we run the metabolism to signaling part, we need to prepare again the inputs.

CARNIVAL_options$timelimit <- 3600
CARNIVAL_options$mipGAP <- 0.05
CARNIVAL_options$threads <- 2

Now that the correct time is set up for the pre-optimisation run, we can prepare the inputs.

test_back <- preprocess_COSMOS_metabolism_to_signaling(meta_network = toy_network,
                                        signaling_data = toy_signaling_input,
                                        metabolic_data = toy_metabolic_input,
                                                       diff_expression_data = toy_RNA,
                                                       maximum_network_depth = 15,
                                                       remove_unexpressed_nodes = FALSE,
                                                       CARNIVAL_options = CARNIVAL_options
                                                       
)
## [1] "COSMOS: all 2 signaling nodes from data were found in the meta PKN"
## [1] "COSMOS: all 3 metabolic nodes from data were found in the meta PKN"
## [1] "COSMOS: 4653 of the 15919 genes in expression data were found as transcription factor target"
## [1] "COSMOS: 4653 of the 5303 transcription factor targets were found in expression data"
## [1] "COSMOS: removing nodes that are not reachable from inputs within 15 steps"
## [1] "COSMOS: 1 from  19 interactions are removed from the PKN"
## [1] "COSMOS: 1 input/measured nodes are not in PKN any more: X2305 and 0 more."
## [1] "COSMOS: removing nodes that are not observable by measurements within 15 steps"
## [1] "COSMOS: 11 from  18 interactions are removed from the PKN"
## [1] "COSMOS: 2 input/measured nodes are not in PKN any more: XMetab__124886___c____, XMetab__6426851___c____ and 0 more."
## [1] "COSMOS:  0 interactions are removed from the PKN based on consistency check between TF activity and gene expression"
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
## [1] "COSMOS:  0 interactions are removed from the PKN based on consistency check between TF activity and gene expression"
## [1] "COSMOS: all 1 signaling nodes from data were found in the meta PKN"
## [1] "COSMOS: all 1 metabolic nodes from data were found in the meta PKN"
## [1] "COSMOS: 4653 of the 15919 genes in expression data were found as transcription factor target"
## [1] "COSMOS: 4653 of the 5303 transcription factor targets were found in expression data"

Then we can run cosmosR to connect metabolism to signaling. The running time here usually needs to be longer, as this problem seems to be harder to solve for CPLEX.

## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
## [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."

Finally we can format the result of the backward run as well (same as for forward run)

test_result_back <- format_COSMOS_res(test_result_back,
                                      metab_mapping = metabolite_to_pubchem,
                      measured_nodes = unique(c(names(toy_metabolic_input),
                                                names(toy_signaling_input))),
                                      omnipath_ptm = omnipath_ptm)

Tutorial section: Merge forward and backward networks and visualise network

Here we simply take the union of forward and backward runs to create a full network solution lopping between signaling, gene-regulation and metabolism. Since there is an overlapp between the result network of forward and backward run, you may optionally want to check if there are any node sign that are incoherent in the overlapp between the two solutions.

full_sif <- as.data.frame(rbind(test_result_for[[1]], test_result_back[[1]]))
full_attributes <- as.data.frame(rbind(test_result_for[[2]], test_result_back[[2]]))

full_sif <- unique(full_sif)
full_attributes <- unique(full_attributes)

This function will generate a dynamic network plot centered on a given node of the network solution, and connecting it to measured nodes in the given range (here 5 steps).

network_plot <- display_node_neighboorhood(central_node = 'BCAT1', 
                                           sif = full_sif, 
                                           att = full_attributes, 
                                           n = 5)

network_plot

This network represent the flow of activities that can connect FOXM1 up-regulation with glutathione (CID 124886) accumulation. Here, FOXM1 can activate MYC, which in turn activate BCAT1. The activation of BCAT1 can lead to the increased production of glutamate (CID 33032), whioch in turn can be converted to glutathione GGT enzymes.

It is important to understand that each of this links are hypothetical. The come from a larger pool of potential molecular interactions present in multiple online databases and compiled in omnipath, STITCH and recon metabolic network. They exist in the literature and are interactions that are known to potentially exists in other experimental contexts. Thus, COSMOS compile all those potential interactions together and proposes a coherent set that can explain the data at hand. Here, such a set of mechanistic hypothesis is plotted as a network connecting FOXM1 and glutathione production.

Those links should however be considered only as potential mechanistic connections, and will need to be further confirmed experimentally.

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cosmosR_0.99.2
## 
## loaded via a namespace (and not attached):
##   [1] segmented_1.3-4      Category_2.56.0      bitops_1.0-7        
##   [4] fs_1.5.0             bit64_4.0.5          doParallel_1.0.16   
##   [7] httr_1.4.2           rprojroot_2.0.2      tools_4.0.2         
##  [10] bslib_0.2.4          utf8_1.2.1           R6_2.5.0            
##  [13] KernSmooth_2.23-20   DBI_1.1.1            BiocGenerics_0.36.1 
##  [16] colorspace_2.0-1     tidyselect_1.1.1     curl_4.3.1          
##  [19] bit_4.0.4            compiler_4.0.2       cli_2.5.0           
##  [22] textshaping_0.3.3    graph_1.68.0         Biobase_2.50.0      
##  [25] CARNIVAL_1.2.0       desc_1.3.0           sass_0.3.1          
##  [28] scales_1.1.1         readr_1.4.0          genefilter_1.72.1   
##  [31] proxy_0.4-25         RBGL_1.66.0          rappdirs_0.3.3      
##  [34] pkgdown_1.6.1        systemfonts_1.0.1    stringr_1.4.0       
##  [37] digest_0.6.27        mixtools_1.2.0       rmarkdown_2.8       
##  [40] pkgconfig_2.0.3      htmltools_0.5.1.1    bcellViper_1.26.0   
##  [43] dbplyr_2.1.1         fastmap_1.1.0        htmlwidgets_1.5.3   
##  [46] rlang_0.4.11         rstudioapi_0.13      RSQLite_2.2.7       
##  [49] visNetwork_2.0.9     jquerylib_0.1.4      generics_0.1.0      
##  [52] jsonlite_1.7.2       viper_1.24.0         dplyr_1.0.6         
##  [55] RCurl_1.98-1.3       magrittr_2.0.1       Matrix_1.3-3        
##  [58] Rcpp_1.0.6           munsell_0.5.0        S4Vectors_0.28.1    
##  [61] fansi_0.4.2          lifecycle_1.0.0      stringi_1.5.3       
##  [64] yaml_2.2.1           UniProt.ws_2.30.0    MASS_7.3-54         
##  [67] BiocFileCache_1.14.0 org.Hs.eg.db_3.12.0  grid_4.0.2          
##  [70] blob_1.2.1           parallel_4.0.2       crayon_1.4.1        
##  [73] lattice_0.20-44      splines_4.0.2        annotate_1.68.0     
##  [76] hms_1.0.0            knitr_1.33           dorothea_1.2.2      
##  [79] pillar_1.6.0         igraph_1.2.6         lpSolve_5.6.15      
##  [82] codetools_0.2-18     stats4_4.0.2         XML_3.99-0.6        
##  [85] glue_1.4.2           evaluate_0.14        vctrs_0.3.8         
##  [88] foreach_1.5.1        gtable_0.3.0         purrr_0.3.4         
##  [91] kernlab_0.9-29       assertthat_0.2.1     cachem_1.0.4        
##  [94] ggplot2_3.3.3        xfun_0.22            xtable_1.8-4        
##  [97] e1071_1.7-6          ragg_1.1.2           class_7.3-19        
## [100] survival_3.2-11      tibble_3.1.1         iterators_1.0.13    
## [103] AnnotationDbi_1.52.0 memoise_2.0.0        IRanges_2.24.1      
## [106] ellipsis_0.3.2       GSEABase_1.52.1