funki.preprocessing¶
- funki.preprocessing.harmonize(data, vars_use, use_highly_variable=True, recalculate=False, **kwargs)¶
- Executes Harmony batch correction on the data set PCA embedding. NOTE: this method will overwrite the - DataSet.obsm['X_pca']matrix.- Parameters:
- data ( - funki.input.DataSet) – The data set that which Harmony will be executed on
- vars_use (list[str]) – Variables over which to correct for (i.e. batches). Must correspond to column(s) defined in - DataSet.obs
- use_highly_variable (bool, optional) – Whether to use highly variable genes only or all genes available. Only used if PCA has not been computed previously or if - recalculate=True, defaults to- True
- recalculate (bool, optional) – Whether to recalculate the PCA dimensionality reduction, defaults to - False
- **kwargs (optional) – Other keyword arguments that can be passed to - harmonypy.run_harmony()
 
 
- funki.preprocessing.sc_pseudobulk(data, sample_col, groups_col=None, mode='sum', **kwargs)¶
- Wrapper over decoupler.pp.pseudobulk - Parameters:
- data ( - funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts
- sample_col (str) – Column name in - data.obswhere to extract the sample names.
- groups – Column name in - data.obswhere to extract the groups names, defaults to- None
- mode – What method to use for aggregating the counts. Available options are - 'sum',- 'mean'or- 'median', defaults to- 'sum'.
- **kwargs (optional) – Other keyword arguments that can be passed to decoupler.pp.pseudobulk 
 
- Mode type:
- str, optional 
 
- funki.preprocessing.sc_trans_filter(data, min_genes=None, max_genes=None, mito_pct=None)¶
- Applies quality control filters to a given single-cell transcriptomic data set. Can filter out cells based on a minimum and maximum number of genes as well as based on the percentage of mitochondrial genes. - Parameters:
- data ( - funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts
- min_genes (int, optional) – Minimum number of different genes for a cell. If the number is below, that cell will be filtered out, defaults to - None
- max_genes (int, optional) – Maximum number of different genes for a cell. If the number is above, that cell will be filtered out, defaults to - None
- mito_pct (int, optional) – Percentage of mitochondrial genes. If a cell has a percentage above the threshold, it will be filtered out, defaults to - None
 
- Returns:
- The resulting filtered data set after applying the specified thresholds 
- Return type:
 
- funki.preprocessing.sc_trans_normalize_total(data, target_sum=None, log_transform=False)¶
- Normalizes the total counts per cell in a single-cell data set. The normaliztion scales the counts so that the sum of all genes in a cell add up to the specified - target_sum(1e6 by default, equivalent to CPM normalization). If- log_transform=True, it also applies a \(\log(X+1)\) transformation to the resulting normailzed data.- Parameters:
- data ( - funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts
- target_sum (int | float, optional) – The targeted total counts per cell to normalize for, e.g. - 1e6is equivalent to CPM normalization, defaults to- None.
- log_transform (bool, optional) – Whether to apply log-transformation after normalizing the data, defaults to - False
 
- Returns:
- The resulting normalized (and log-transformed, if applicable) data set 
- Return type: