funki.preprocessing

funki.preprocessing.harmonize(data, vars_use, use_highly_variable=True, recalculate=False, **kwargs)

Executes Harmony batch correction on the data set PCA embedding. NOTE: this method will overwrite the DataSet.obsm['X_pca'] matrix.

Parameters:
  • data (funki.input.DataSet) – The data set that which Harmony will be executed on

  • vars_use (list[str]) – Variables over which to correct for (i.e. batches). Must correspond to column(s) defined in DataSet.obs

  • use_highly_variable (bool, optional) – Whether to use highly variable genes only or all genes available. Only used if PCA has not been computed previously or if recalculate=True, defaults to True

  • recalculate (bool, optional) – Whether to recalculate the PCA dimensionality reduction, defaults to False

  • **kwargs (optional) – Other keyword arguments that can be passed to harmonypy.run_harmony()

funki.preprocessing.sc_trans_filter(data, min_genes=None, max_genes=None, mito_pct=None)

Applies quality control filters to a given single-cell transcriptomic data set. Can filter out cells based on a minimum and maximum number of genes as well as based on the percentage of mitochondrial genes.

Parameters:
  • data (funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts

  • min_genes (int, optional) – Minimum number of different genes for a cell. If the number is below, that cell will be filtered out, defaults to None

  • max_genes (int, optional) – Maximum number of different genes for a cell. If the number is above, that cell will be filtered out, defaults to None

  • mito_pct (int, optional) – Percentage of mitochondrial genes. If a cell has a percentage above the threshold, it will be filtered out, defaults to None

Returns:

The resulting filtered data set after applying the specified thresholds

Return type:

funki.input.DataSet

funki.preprocessing.sc_trans_normalize_total(data, target_sum=None, log_transform=False)

Normalizes the total counts per cell in a single-cell data set. The normaliztion scales the counts so that the sum of all genes in a cell add up to the specified target_sum (1e6 by default, equivalent to CPM normalization). If log_transform=True, it also applies a \(\log(X+1)\) transformation to the resulting normailzed data.

Parameters:
  • data (funki.input.DataSet) – A single-cell transcriptomic data set containing raw counts

  • target_sum (int | float, optional) – The targeted total counts per cell to normalize for, e.g. 1e6 is equivalent to CPM normalization, defaults to None.

  • log_transform (bool, optional) – Whether to apply log-transformation after normalizing the data, defaults to False

Returns:

The resulting normalized (and log-transformed, if applicable) data set

Return type:

funki.input.DataSet