Runs checks on the input data and simplifies the prior knowledge network. Simplification includes the removal of (1) nodes that are not reachable from signaling nodes and (2) interactions between transcription factors and target genes if the target gene does not respond or the response is contradictory with the change in the transcription factor activity. Optionally, further TF activities are estimated via network optimization via CARNIVAL and the interactions between TF and genes are filtered again.

preprocess_COSMOS_signaling_to_metabolism(
meta_network = meta_network,
signaling_data,
metabolic_data,
diff_expression_data,
diff_exp_threshold = 1,
maximum_network_depth = 8,
expressed_genes = names(diff_expression_data)[!is.na(diff_expression_data)],
remove_unexpressed_nodes = TRUE,
filter_tf_gene_interaction_by_optimization = TRUE,
CARNIVAL_options = default_CARNIVAL_options()
)

## Arguments

meta_network prior knowledge network. By default COSMOS use a PKN derived from Omnipath, STITCHdb and Recon3D. See details on the data meta_network. collection of transcription factor - target interactions. A default collection from dorothea can be obtained by the load_tf_regulon_dorothea function. numerical vector, where names are signaling nodes in the PKN and values are from {1, 0, -1}. Continuous data will be discretized using the sign function. numerical vector, where names are metabolic nodes in the PKN and values are continuous values that represents log2 fold change or t-values from a differential analysis. These values are compared to the simulation results (simulated nodes can take value -1, 0 or 1) (optional) numerical vector that represents the results of a differential gene expression analysis. Names are gene names using EntrezID starting with an X and values are log fold change or t-values. convert_genesymbols_to_entrezid can be used for conversion. We use the “diff_exp_threshold” parameter to decide which genes changed significantly. Genes with NA values are considered none expressed and they will be removed from the TF-gene expression interactions. threshold parameter (default 1) used to binarize the values of “diff_expression_data”. integer > 0 (default: 8). Nodes that are further than “maximum_network_depth” steps from the signaling nodes on the directed graph of the PKN are considered non-reachable and are removed. character vector. Names of nodes that are expressed. By default we consider all the nodes that appear in diff_expression_data with a numeric value (i.e. nodes with NA are removed) if TRUE (default) removes nodes from the PKN that are not expressed, see input “expressed_genes”. (default:TRUE), if TRUE then runs a network optimization that estimates TF activity not included in the inputs and checks the consistency between the estimated activity and change in gene expression. Removes interactions where TF and gene expression are inconsistent list that controls the options of CARNIVAL. See details in default_CARNIVAL_options.

## Value

cosmos_data object with the following fields:

meta_network

Filtered PKN

tf_regulon

TF - target regulatory network

signaling_data_bin

Binarised signaling data

metabolic_data

Metabolomics data

diff_expression_data_bin

Binarized gene expression data

optimized_network

Initial optimized network if filter_tf_gene_interaction_by_optimization is TRUE

meta_network for meta PKN, load_tf_regulon_dorothea for tf regulon, convert_genesymbols_to_entrezid for gene conversion, runCARNIVAL.

## Examples

CARNIVAL_options <- cosmosR::default_CARNIVAL_options()
CARNIVAL_options\$solver <- "lpSolve"
test_for <- preprocess_COSMOS_signaling_to_metabolism(meta_network = toy_network,
signaling_data = toy_signaling_input,
metabolic_data = toy_metabolic_input,
diff_expression_data = toy_RNA,
maximum_network_depth = 15,
remove_unexpressed_nodes = TRUE,
CARNIVAL_options = CARNIVAL_options
)
#> 'select()' returned 1:1 mapping between keys and columns#> 'select()' returned 1:many mapping between keys and columns#> [1] "COSMOS: all 2 signaling nodes from data were found in the meta PKN"
#> [1] "COSMOS: all 3 metabolic nodes from data were found in the meta PKN"
#> [1] "COSMOS: 4653 of the 15919 genes in expression data were found as transcription factor target"
#> [1] "COSMOS: 4653 of the 5303 transcription factor targets were found in expression data"
#> [1] "COSMOS: removing unexpressed nodes from PKN..."
#> [1] "COSMOS: 0 interactions removed"
#> [1] "COSMOS: removing nodes that are not reachable from inputs within 15 steps"
#> [1] "COSMOS: 10 from  19 interactions are removed from the PKN"
#> [1] "COSMOS: 1 input/measured nodes are not in PKN any more: XMetab__190___c____ and 0 more."
#> [1] "COSMOS: removing nodes that are not observable by measurements within 15 steps"
#> [1] "COSMOS: 0 from  9 interactions are removed from the PKN"
#> [1] "COSMOS:  1 interactions are removed from the PKN based on consistency check between TF activity and gene expression"
#> [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."
#> [1] "lpSolve does not scale well with large PKNs. This solver is mainly for testing purposes. To run COSMSO, we recommend using cplex, or cbc solvers."#> Writing constraints...#> Solving LP problem...#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   enter Problem = col_character()
#> )#> [1] "COSMOS:  0 interactions are removed from the PKN based on consistency check between TF activity and gene expression"#> COSMOS: The following signaling nodes are not found in the PKN and will be removed from input: X3725 and 0 more.#> [1] "COSMOS: all 2 metabolic nodes from data were found in the meta PKN"
#> [1] "COSMOS: 4653 of the 15919 genes in expression data were found as transcription factor target"
#> [1] "COSMOS: 4653 of the 5303 transcription factor targets were found in expression data"