Overview

Transcriptome profiling followed by differential gene expression analysis often leads to lists of genes that are hard to analyze and interpret. Downstream analysis tools can be used to summarize deregulation events into a smaller set of biologically interpretable features. In particular, methods that estimate the activity of transcription factors (TFs) from gene expression are commonly used. It has been shown that the transcriptional targets of a TF yield a much more robust estimation of the TF activity than observing the expression of the TF itself. Consequently, for the estimation of transcription factor activities, a network of transcriptional regulation is required in combination with a statistical algorithm that summarizes the expression of the target genes into a single activity score. Over the years, many different regulatory networks and statistical algorithms have been developed, mostly in a fixed combination of one network and one algorithm. To systematically evaluate both networks and algorithms, we developed decoupleR , an R package that allows users to apply efficiently any combination provided.

As an initial set of regulatory networks, we integrated the following resources:

And the following TF activity inference methods:

Estimation based on enrichment of sets on rankings

Estimation from linear models

  • SCIRA: Linear models to estimate TF activities from gene expression as defined here

  • pscira: Linear combination of gene expression based on mode of regulation followed by a comparison to a random null model.

Estimation from statistics

  • mean: Weighted mean that allows the use of directions and contribution weights.

  • normalized_mean: Similar as above, however the final score is corrected based on a null (random) model

We benchmarked an initial set of 84 combinations, comprising 7 methods and 13 networks.

We evaluated the precision of different combinations in recovering perturbed TFs from different collections of gene expression datasets. Additionally, we tested the effects of combining multiple sources and estimations. We set up the package in a modular way which makes it easy and intuitive to extend it with further statistics or networks. We invite the community to participate by implementing their own statistics or integrating their gene regulatory network. With the decoupleR package, we lay the foundation for a crowdsourced systematic assessment of transcription factor activity estimation from transcriptomics data.

Installation instructions

Get the latest stable R release from CRAN.

Then install development version from GitHub with:

# install.packages("devtools")
devtools::install_github("saezlab/decoupleR")

Usage

Load packaga and data

library(decoupleR)

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
    readRDS() %>%
    dplyr::glimpse()
#>  num [1:18490, 1:4] 3.251 0.283 -2.253 0.782 -4.575 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:18490] "A1BG" "A1CF" "A2M" "A2ML1" ...
#>   ..$ : chr [1:4] "GSM2753335" "GSM2753336" "GSM2753337" "GSM2753338"

network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
    readRDS() %>%
    dplyr::glimpse()
#> Rows: 151
#> Columns: 5
#> $ tf         <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOX…
#> $ confidence <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"…
#> $ target     <chr> "BCL2L11", "BCL6", "CDKN1A", "CDKN1B", "G6PC", "GADD45A", …
#> $ mor        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ likelihood <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

Decouple wrapper

decouple() allows access to all decoupleR available statistics in one place. Statistic functions inside decoupleR always return a tidy tibble that can be easily processed with the tools provide by the tidyverse ecosystem.

decouple(
    mat = mat,
    network = network,
    .source = "tf",
    .target = "target",
    statistics = c("gsva", "mean", "pscira", "scira", "viper"),
    args = list(
        gsva = list(verbose = FALSE),
        mean = list(.mor = "mor", .likelihood = "likelihood"),
        pscira = list(.mor = "mor"),
        scira = list(.mor = "mor"),
        viper = list(.mor = "mor", .likelihood = "likelihood", verbose = FALSE)
    )
)
#> # A tibble: 112 x 7
#>    run_id statistic tf    condition    score statistic_time p_value
#>    <chr>  <chr>     <chr> <chr>        <dbl> <drtn>           <dbl>
#>  1 1      gsva      FOXO4 GSM2753335 -0.380  5.346824 secs       NA
#>  2 1      gsva      FOXO4 GSM2753336 -0.300  5.346824 secs       NA
#>  3 1      gsva      FOXO4 GSM2753337  0.239  5.346824 secs       NA
#>  4 1      gsva      FOXO4 GSM2753338  0.0907 5.346824 secs       NA
#>  5 1      gsva      NFIC  GSM2753335 -0.0845 5.346824 secs       NA
#>  6 1      gsva      NFIC  GSM2753336  0.0778 5.346824 secs       NA
#>  7 1      gsva      NFIC  GSM2753337 -0.260  5.346824 secs       NA
#>  8 1      gsva      NFIC  GSM2753338  0.281  5.346824 secs       NA
#>  9 1      gsva      RFXAP GSM2753335 -0.810  5.346824 secs       NA
#> 10 1      gsva      RFXAP GSM2753336 -0.472  5.346824 secs       NA
#> # … with 102 more rows

Individual parts

In turn, we recognize that the use of individual statistics may be of interest. Therefore, these are also exported and ready for use. All statistics follow the same design pattern and arguments, so moving between statistics could be very comfortable.

# viper call is equivalent to the one made by decouple() above.
run_viper(
    mat = mat,
    network = network,
    .source = "tf",
    .target = "target",
    .likelihood = "likelihood",
    verbose = FALSE
)
#> # A tibble: 12 x 5
#>    statistic tf     condition    score statistic_time 
#>    <chr>     <chr>  <chr>        <dbl> <drtn>         
#>  1 viper     NFIC   GSM2753335  0.0696 0.01740503 secs
#>  2 viper     NFIC   GSM2753336 -0.0265 0.01740503 secs
#>  3 viper     NFIC   GSM2753337 -0.516  0.01740503 secs
#>  4 viper     NFIC   GSM2753338 -0.543  0.01740503 secs
#>  5 viper     SMAD3  GSM2753335  0.176  0.01740503 secs
#>  6 viper     SMAD3  GSM2753336  0.0426 0.01740503 secs
#>  7 viper     SMAD3  GSM2753337  0.219  0.01740503 secs
#>  8 viper     SMAD3  GSM2753338  0.142  0.01740503 secs
#>  9 viper     TFAP2A GSM2753335  0.722  0.01740503 secs
#> 10 viper     TFAP2A GSM2753336  0.582  0.01740503 secs
#> 11 viper     TFAP2A GSM2753337  0.462  0.01740503 secs
#> 12 viper     TFAP2A GSM2753338  0.330  0.01740503 secs

Contributing to decoupleR

Are you interested in adding a new statistical method or collaborating in the development of internal tools that allow the extension of the package? Please check out our contribution guide.


Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.