funki.preprocessing¶
- funki.preprocessing.harmonize(data, vars_use, use_highly_variable=True, recalculate=False, **kwargs)¶
Executes Harmony batch correction on the data set PCA embedding. NOTE: this method will overwrite the
DataSet.obsm['X_pca']
matrix.- Parameters:
data (
funki.input.DataSet
) – The data set that which Harmony will be executed onvars_use (list[str]) – Variables over which to correct for (i.e. batches). Must correspond to column(s) defined in
DataSet.obs
use_highly_variable (bool, optional) – Whether to use highly variable genes only or all genes available. Only used if PCA has not been computed previously or if
recalculate=True
, defaults toTrue
recalculate (bool, optional) – Whether to recalculate the PCA dimensionality reduction, defaults to
False
**kwargs (optional) – Other keyword arguments that can be passed to
harmonypy.run_harmony()
- funki.preprocessing.sc_trans_filter(data, min_genes=None, max_genes=None, mito_pct=None)¶
Applies quality control filters to a given single-cell transcriptomic data set. Can filter out cells based on a minimum and maximum number of genes as well as based on the percentage of mitochondrial genes.
- Parameters:
data (
funki.input.DataSet
) – A single-cell transcriptomic data set containing raw countsmin_genes (int, optional) – Minimum number of different genes for a cell. If the number is below, that cell will be filtered out, defaults to
None
max_genes (int, optional) – Maximum number of different genes for a cell. If the number is above, that cell will be filtered out, defaults to
None
mito_pct (int, optional) – Percentage of mitochondrial genes. If a cell has a percentage above the threshold, it will be filtered out, defaults to
None
- Returns:
The resulting filtered data set after applying the specified thresholds
- Return type:
- funki.preprocessing.sc_trans_normalize_total(data, target_sum=None, log_transform=False)¶
Normalizes the total counts per cell in a single-cell data set. The normaliztion scales the counts so that the sum of all genes in a cell add up to the specified
target_sum
(1e6 by default, equivalent to CPM normalization). Iflog_transform=True
, it also applies a \(\log(X+1)\) transformation to the resulting normailzed data.- Parameters:
data (
funki.input.DataSet
) – A single-cell transcriptomic data set containing raw countstarget_sum (int | float, optional) – The targeted total counts per cell to normalize for, e.g.
1e6
is equivalent to CPM normalization, defaults toNone
.log_transform (bool, optional) – Whether to apply log-transformation after normalizing the data, defaults to
False
- Returns:
The resulting normalized (and log-transformed, if applicable) data set
- Return type: