Transcriptome profiling followed by differential gene expression analysis often leads to lists of genes that are hard to analyze and interpret. Downstream analysis tools can be used to summarize deregulation events into a smaller set of biologically interpretable features. In particular, methods that estimate the activity of transcription factors (TFs) from gene expression are commonly used. It has been shown that the transcriptional targets of a TF yield a much more robust estimation of the TF activity than observing the expression of the TF itself. Consequently, for the estimation of transcription factor activities, a network of transcriptional regulation is required in combination with a statistical algorithm that summarizes the expression of the target genes into a single activity score. Over the years, many different regulatory networks and statistical algorithms have been developed, mostly in a fixed combination of one network and one algorithm. To systematically evaluate both networks and algorithms, we developed decoupleR , an R package that allows users to apply efficiently any combination provided.
As an initial set of regulatory networks, we integrated the following resources:
And the following TF activity inference methods:
SCIRA: Linear models to estimate TF activities from gene expression as defined here
pscira: Linear combination of gene expression based on mode of regulation followed by a comparison to a random null model.
mean: Weighted mean that allows the use of directions and contribution weights.
normalized_mean: Similar as above, however the final score is corrected based on a null (random) model
We benchmarked an initial set of 84 combinations, comprising 7 methods and 13 networks.
We evaluated the precision of different combinations in recovering perturbed TFs from different collections of gene expression datasets. Additionally, we tested the effects of combining multiple sources and estimations. We set up the package in a modular way which makes it easy and intuitive to extend it with further statistics or networks. We invite the community to participate by implementing their own statistics or integrating their gene regulatory network. With the decoupleR package, we lay the foundation for a crowdsourced systematic assessment of transcription factor activity estimation from transcriptomics data.
Get the latest stable R
release from CRAN.
Then install development version from GitHub with:
# install.packages("devtools")
devtools::install_github("saezlab/decoupleR")
library(decoupleR)
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
readRDS() %>%
dplyr::glimpse()
#> num [1:18490, 1:4] 3.251 0.283 -2.253 0.782 -4.575 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:18490] "A1BG" "A1CF" "A2M" "A2ML1" ...
#> ..$ : chr [1:4] "GSM2753335" "GSM2753336" "GSM2753337" "GSM2753338"
network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
readRDS() %>%
dplyr::glimpse()
#> Rows: 151
#> Columns: 5
#> $ tf <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOX…
#> $ confidence <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"…
#> $ target <chr> "BCL2L11", "BCL6", "CDKN1A", "CDKN1B", "G6PC", "GADD45A", …
#> $ mor <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ likelihood <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
decouple()
allows access to all decoupleR available statistics in one place. Statistic functions inside decoupleR always return a tidy tibble that can be easily processed with the tools provide by the tidyverse ecosystem.
decouple(
mat = mat,
network = network,
.source = "tf",
.target = "target",
statistics = c("gsva", "mean", "pscira", "scira", "viper"),
args = list(
gsva = list(verbose = FALSE),
mean = list(.mor = "mor", .likelihood = "likelihood"),
pscira = list(.mor = "mor"),
scira = list(.mor = "mor"),
viper = list(.mor = "mor", .likelihood = "likelihood", verbose = FALSE)
)
)
#> # A tibble: 112 x 7
#> run_id statistic tf condition score statistic_time p_value
#> <chr> <chr> <chr> <chr> <dbl> <drtn> <dbl>
#> 1 1 gsva FOXO4 GSM2753335 -0.380 5.346824 secs NA
#> 2 1 gsva FOXO4 GSM2753336 -0.300 5.346824 secs NA
#> 3 1 gsva FOXO4 GSM2753337 0.239 5.346824 secs NA
#> 4 1 gsva FOXO4 GSM2753338 0.0907 5.346824 secs NA
#> 5 1 gsva NFIC GSM2753335 -0.0845 5.346824 secs NA
#> 6 1 gsva NFIC GSM2753336 0.0778 5.346824 secs NA
#> 7 1 gsva NFIC GSM2753337 -0.260 5.346824 secs NA
#> 8 1 gsva NFIC GSM2753338 0.281 5.346824 secs NA
#> 9 1 gsva RFXAP GSM2753335 -0.810 5.346824 secs NA
#> 10 1 gsva RFXAP GSM2753336 -0.472 5.346824 secs NA
#> # … with 102 more rows
In turn, we recognize that the use of individual statistics may be of interest. Therefore, these are also exported and ready for use. All statistics follow the same design pattern and arguments, so moving between statistics could be very comfortable.
# viper call is equivalent to the one made by decouple() above.
run_viper(
mat = mat,
network = network,
.source = "tf",
.target = "target",
.likelihood = "likelihood",
verbose = FALSE
)
#> # A tibble: 12 x 5
#> statistic tf condition score statistic_time
#> <chr> <chr> <chr> <dbl> <drtn>
#> 1 viper NFIC GSM2753335 0.0696 0.01740503 secs
#> 2 viper NFIC GSM2753336 -0.0265 0.01740503 secs
#> 3 viper NFIC GSM2753337 -0.516 0.01740503 secs
#> 4 viper NFIC GSM2753338 -0.543 0.01740503 secs
#> 5 viper SMAD3 GSM2753335 0.176 0.01740503 secs
#> 6 viper SMAD3 GSM2753336 0.0426 0.01740503 secs
#> 7 viper SMAD3 GSM2753337 0.219 0.01740503 secs
#> 8 viper SMAD3 GSM2753338 0.142 0.01740503 secs
#> 9 viper TFAP2A GSM2753335 0.722 0.01740503 secs
#> 10 viper TFAP2A GSM2753336 0.582 0.01740503 secs
#> 11 viper TFAP2A GSM2753337 0.462 0.01740503 secs
#> 12 viper TFAP2A GSM2753338 0.330 0.01740503 secs
Are you interested in adding a new statistical method or collaborating in the development of internal tools that allow the extension of the package? Please check out our contribution guide.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.