Abstract

This vignette describes how to infer transcription factor activity from bulk transcriptome data by using DoRothEA’s curated regulons with viper.

Introduction

DoRothEA is a comprehensive resource containing a curated collection of transcription factors (TFs) and its transcriptional targets. The set of genes regulated by a specific transcription factor is known as regulon. DoRothEA’s regulons were gathered from different types of evidence. Each TF-target interaction is defined by a confidence level based on the number of supporting evidence. The confidence levels ranges from A (highest confidence) to E (lowest confidence) (Garcia-Alonso et al. 2019). While DoRothEA was originally developed for the application on human data it can be applied also on mouse data with comparable performace but better coverage than dedicated mouse regulons (Holland, Szalai, and Saez-Rodriguez 2019).

DoRothEA regulons are usually coupled with the statistical method VIPER (Alvarez et al. 2016). In this context, TF activities are computed based on the mRNA expression levels of its targets. We therefore can consider TF activity as a proxy of a given transcriptional state (Dugourd and Saez-Rodriguez 2019). However, it is up to the user to decide which statistcal method to use. Alternatives could be for instance classical Gene Set Enrichment Analysis or simply mean statistic.

Installation

First of all, you need a current version of R (http://www.r-project.org). In addition you need dorothea, a freely available package deposited on http://bioconductor.org/ and https://github.com/saezlab/dorothea.

You can install it by running the following commands on an R console:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("dorothea")

We also load here the packages required to run this vignette

## We load the required packages
library(dorothea)
library(bcellViper)
library(dplyr)
library(viper)

Example of usage

According to the vignette from the viper package we demonstrate how to combine viper with regulons from DoRothEA.

Accessing example data and DoRothEA regulons.

Similiar to the viper vignette we use the gene expression matrix from the bcellViper package. Click here for more information about the gene expression matrix. The regulons from DoRothEA are provided within the dorothea package and can be acessed via the data() function. As the gene expression matrix contains human data we also load the human version of DoRothEA.

# accessing expression data from bcellViper
data(bcellViper, package = "bcellViper")

# acessing (human) dorothea regulons
# for mouse regulons: data(dorothea_mm, package = "dorothea")
data(dorothea_hs, package = "dorothea")

Running VIPER with DoRothEA regulons

We implemented a wrapper for the viper function that can deal with different input types such as matrix, dataframe, ExpressionSet or Seurat objects (see dedicated vignette for single-cell analysis). We subset DoRothEA to the confidence levels A and B to include only the high quality regulons.

regulons = dorothea_hs %>%
  filter(confidence %in% c("A", "B"))

tf_activities <- run_viper(dset, regulons,
                           options =  list(method = "scale", minsize = 4,
                                           eset.filter = FALSE, cores = 1,
                                           verbose = FALSE))

Session info

## R version 4.0.1 (2020-06-06)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] viper_1.22.0        dplyr_1.0.0         bcellViper_1.24.0  
## [4] Biobase_2.48.0      BiocGenerics_0.34.0 dorothea_1.0.1     
## [7] BiocStyle_2.16.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6        compiler_4.0.1      pillar_1.4.4       
##  [4] BiocManager_1.30.10 mixtools_1.2.0      class_7.3-17       
##  [7] tools_4.0.1         digest_0.6.25       lattice_0.20-41    
## [10] evaluate_0.14       memoise_1.1.0       lifecycle_0.2.0    
## [13] tibble_3.0.1        pkgconfig_2.0.3     rlang_0.4.6        
## [16] Matrix_1.2-18       yaml_2.2.1          pkgdown_1.5.1      
## [19] xfun_0.14           e1071_1.7-3         stringr_1.4.0      
## [22] knitr_1.28          desc_1.2.0          generics_0.0.2     
## [25] fs_1.4.1            vctrs_0.3.1         grid_4.0.1         
## [28] segmented_1.1-0     rprojroot_1.3-2     tidyselect_1.1.0   
## [31] glue_1.4.1          R6_2.4.1            survival_3.1-12    
## [34] rmarkdown_2.2       bookdown_0.19       kernlab_0.9-29     
## [37] purrr_0.3.4         magrittr_1.5        splines_4.0.1      
## [40] backports_1.1.7     htmltools_0.4.0     ellipsis_0.3.1     
## [43] MASS_7.3-51.6       assertthat_0.2.1    KernSmooth_2.23-17 
## [46] stringi_1.4.6       crayon_1.3.4

References

Alvarez, Mariano J, Yao Shen, Federico M Giorgi, Alexander Lachmann, B Belinda Ding, B Hilda Ye, and Andrea Califano. 2016. “Functional Characterization of Somatic Mutations in Cancer Using Network-Based Inference of Protein Activity.” Nature Genetics 48 (8). Springer Science; Business Media LLC: 838–47. https://doi.org/10.1038/ng.3593.

Dugourd, Aurelien, and Julio Saez-Rodriguez. 2019. “Footprint-Based Functional Analysis of Multiomic Data.” Current Opinion in Systems Biology 15 (June). Elsevier BV: 82–90. https://doi.org/10.1016/j.coisb.2019.04.002.

Garcia-Alonso, Luz, Christian H. Holland, Mahmoud M. Ibrahim, Denes Turei, and Julio Saez-Rodriguez. 2019. “Benchmark and Integration of Resources for the Estimation of Human Transcription Factor Activities.” Genome Research 29 (8). Cold Spring Harbor Laboratory: 1363–75. https://doi.org/10.1101/gr.240663.118.

Holland, Christian H., Bence Szalai, and Julio Saez-Rodriguez. 2019. “Transfer of Regulatory Knowledge from Human to Mouse for Functional Genomics Analysis.” Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, September. Elsevier BV, 194431. https://doi.org/10.1016/j.bbagrm.2019.194431.