funki.preprocessing¶

funki.preprocessing.harmonize(data, vars_use, use_highly_variable=True, recalculate=False, **kwargs)¶

Executes Harmony batch correction on the data set PCA embedding. NOTE: this method will overwrite the DataSet.obsm['X_pca'] matrix.

Parameters:

data (funki.input.DataSet) – The data set that which Harmony will be executed on
vars_use (list[str]) – Variables over which to correct for (i.e. batches). Must correspond to column(s) defined in DataSet.obs
use_highly_variable (bool, optional) – Whether to use highly variable genes only or all genes available. Only used if PCA has not been computed previously or if recalculate=True, defaults to True
recalculate (bool, optional) – Whether to recalculate the PCA dimensionality reduction, defaults to False
**kwargs (optional) – Other keyword arguments that can be passed to harmonypy.run_harmony()

funki.preprocessing.sc_pseudobulk(data, sample_col, groups_col=None, mode='sum', **kwargs)¶

Wrapper over decoupler.pp.pseudobulk

Parameters:

data (funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts
sample_col (str) – Column name in data.obs where to extract the sample names.
groups – Column name in data.obs where to extract the groups names, defaults to None
mode – What method to use for aggregating the counts. Available options are 'sum', 'mean' or 'median', defaults to 'sum'.
**kwargs (optional) – Other keyword arguments that can be passed to decoupler.pp.pseudobulk

Mode type:

str, optional

funki.preprocessing.sc_trans_filter(data, min_genes=None, max_genes=None, mito_pct=None)¶

Applies quality control filters to a given single-cell transcriptomic data set. Can filter out cells based on a minimum and maximum number of genes as well as based on the percentage of mitochondrial genes.

Parameters:

data (funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts
min_genes (int, optional) – Minimum number of different genes for a cell. If the number is below, that cell will be filtered out, defaults to None
max_genes (int, optional) – Maximum number of different genes for a cell. If the number is above, that cell will be filtered out, defaults to None
mito_pct (int, optional) – Percentage of mitochondrial genes. If a cell has a percentage above the threshold, it will be filtered out, defaults to None

Returns:

The resulting filtered data set after applying the specified thresholds

Return type:

funki.input.DataSet

funki.preprocessing.sc_trans_normalize_total(data, target_sum=None, log_transform=False)¶

Normalizes the total counts per cell in a single-cell data set. The normaliztion scales the counts so that the sum of all genes in a cell add up to the specified target_sum (1e6 by default, equivalent to CPM normalization). If log_transform=True, it also applies a \(\log(X+1)\) transformation to the resulting normailzed data.

Parameters:

data (funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts
target_sum (int | float, optional) – The targeted total counts per cell to normalize for, e.g. 1e6 is equivalent to CPM normalization, defaults to None.
log_transform (bool, optional) – Whether to apply log-transformation after normalizing the data, defaults to False

Returns:

The resulting normalized (and log-transformed, if applicable) data set

Return type:

funki.input.DataSet

funki.preprocessing¶

Table of Contents

Previous topic

Next topic

This Page