Calculates regulatory activities using GSVA.
Usage
run_gsva(
mat,
network,
.source = source,
.target = target,
verbose = FALSE,
method = c("gsva", "plage", "ssgsea", "zscore"),
minsize = 5L,
maxsize = Inf,
...
)
Arguments
- mat
Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns.
rownames(mat)
must have at least one intersection with the elements innetwork
.target
column.- network
Tibble or dataframe with edges and it's associated metadata.
- .source
Column with source nodes.
- .target
Column with target nodes.
- verbose
Gives information about each calculation step. Default: FALSE.
- method
Method to employ in the estimation of gene-set enrichment. scores per sample. By default this is set to gsva (Hänzelmann et al, 2013). Further available methods are "plage", "ssgsea" and "zscore". Read more in the manual of
GSVA::gsva
.- minsize
Integer indicating the minimum number of targets per source. Must be greater than 0.
- maxsize
Integer indicating the maximum number of targets per source.
- ...
Arguments passed on to
GSVA::gsvaParam
,GSVA::ssgseaParam
assay
The name of the assay to use in case
exprData
is a multi-assay container, otherwise ignored. By default, the first assay is used.annotation
The name of a Bioconductor annotation package for the gene identifiers occurring in the row names of the expression data matrix. This can be used to map gene identifiers occurring in the gene sets if those are provided in a
GeneSetCollection
. By default gene identifiers used in expression data matrix and gene sets are matched directly.kcdf
Character vector of length 1 denoting the kernel to use during the non-parametric estimation of the cumulative distribution function of expression levels across samples. By default,
kcdf="Gaussian"
which is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs or log-TPMs. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set tokcdf="Poisson"
.tau
Numeric vector of length 1. The exponent defining the weight of the tail in the random walk performed by the
GSVA
(Hänzelmann et al., 2013) method. The default value is 1 as described in the paper.maxDiff
Logical vector of length 1 which offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic.
FALSE
: ES is calculated as the maximum distance of the random walk from 0.TRUE
(the default): ES is calculated as the magnitude difference between the largest positive and negative random walk deviations.
absRanking
Logical vector of length 1 used only when
maxDiff=TRUE
. WhenabsRanking=FALSE
(default) a modified Kuiper statistic is used to calculate enrichment scores, taking the magnitude difference between the largest positive and negative random walk deviations. WhenabsRanking=TRUE
the original Kuiper statistic that sums the largest positive and negative random walk deviations, is used. In this latter case, gene sets with genes enriched on either extreme (high or low) will be regarded as ’highly’ activated.alpha
Numeric vector of length 1. The exponent defining the weight of the tail in the random walk performed by the
ssGSEA
(Barbie et al., 2009) method. The default value is 0.25 as described in the paper.normalize
Logical vector of length 1; if
TRUE
runs thessGSEA
method from Barbie et al. (2009) normalizing the scores by the absolute difference between the minimum and the maximum, as described in their paper. Otherwise this last normalization step is skipped.
Value
A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:
statistic
: Indicates which method is associated with which score.source
: Source nodes ofnetwork
.condition
: Condition representing each column ofmat
.score
: Regulatory activity (enrichment score).
Details
GSVA (Hänzelmann et al., 2013) starts by transforming the input molecular
readouts in mat to a readout-level statistic using Gaussian kernel estimation
of the cumulative density function. Then, readout-level statistics are
ranked per sample and normalized to up-weight the two tails of the rank
distribution. Afterwards, an enrichment score gsva
is calculated
using a running sum statistic that is normalized by subtracting the largest
negative estimate from the largest positive one.
Hänzelmann S. et al. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 14, 7.
Examples
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))
run_gsva(mat, net, minsize=1, verbose = FALSE)
#> # A tibble: 72 × 4
#> statistic source condition score
#> <chr> <chr> <chr> <dbl>
#> 1 gsva T1 S01 0.222
#> 2 gsva T1 S02 0.556
#> 3 gsva T1 S03 0.667
#> 4 gsva T1 S04 0.778
#> 5 gsva T1 S05 0.556
#> 6 gsva T1 S06 0.667
#> 7 gsva T1 S07 0.667
#> 8 gsva T1 S08 0.667
#> 9 gsva T1 S09 0.889
#> 10 gsva T1 S10 0.444
#> # ℹ 62 more rows