Skip to content

Upstream

Filtering

mina.up.filter_anndata_by_ncells(anndata_dict, min_cells)

Filters out samples (rows) from AnnData objects in the dictionary where the number of cells (psbulk_cells) in .obs is less than the specified threshold.

Updates the .var attribute with total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with AnnData objects as values.

required
min_cells int or dict
  • If int, the same minimum number of cells is applied to all AnnData objects.
  • If dict, must have the same keys as anndata_dict, where each value is the minimum number of cells for that dataset.
required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_views_by_samples(anndata_dict, min_rows)

Filters out AnnData objects in the dictionary that have fewer samples (rows) than the specified threshold.

Also updates the .var attribute to include total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with AnnData objects as values.

required
min_rows int

Minimum number of rows required for an AnnData object to remain in the dictionary.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_genes_byexpr(anndata_dict, min_count, min_prop)

Filters genes in AnnData objects in the given dictionary based on count proportions, keeping all rows and filtering columns.

Also updates the .var attribute with total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
min_count int

Minimum count threshold for filtering genes.

required
min_prop float

Minimum proportion of samples (rows) where the count is >= min_count.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_views_by_genes(anndata_dict, min_genes_per_view)

Drops AnnData objects from the dictionary that have fewer than the specified number of genes (columns) after filtering.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
min_genes_per_view int

Minimum number of genes (columns) that must remain in an AnnData object for it to be kept.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_samples_by_coverage(anndata_dict, threshold, min_prop)

Filters out samples in AnnData objects that do not have a sufficient proportion of genes with values greater than a specified threshold.

Updates the dictionary in place and updates the .var attribute with total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
threshold float

The count threshold a gene value must exceed to be considered. Normally left at 0.

required
min_prop float

Minimum proportion of genes that must exceed the threshold for a sample to be kept.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_genes_by_celltype(anndata_dict, gene_lists)

Filters out genes from AnnData objects based on provided lists of genes to exclude.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
gene_lists dict[str, list[str]]

Dictionary with cell types as keys and lists of genes to exclude.

required

Returns:

Type Description
None

The function modifies the input AnnData objects in place.

mina.up.filter_smpls_by_nview(anndata_dict, min_views)

Filters out samples in AnnData objects that do not appear in at least a minimum number of views.

A sample (identified by its .obs.index) is kept only if it is present in min_views or more AnnData objects (views). The input dictionary is updated in place, with each AnnData object subset to the eligible samples.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with view or cell-type names as keys and AnnData objects as values. Sample identifiers are taken from adata.obs.index and must be comparable across views.

required
min_views int

Minimum number of views in which a sample must be present to be retained.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.get_hvgs(anndata_dict, groupby=None, ngroups_cut=2)

Identify genes to exclude for each AnnData object, based on HVG masking.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with view/cell-type keys and AnnData objects as values.

required
groupby str

Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping.

None
ngroups_cut int

Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None.

2

Returns:

Type Description
dict[str, list[str]]

Dictionary with cell types as keys and lists of not variable genes to be excluded.

mina.up.filter_hvgs(anndata_dict, groupby=None, ngroups_cut=None)

Identify highly variable genes (HVGs) for each AnnData object and filter out non-HVGs.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with view or cell-type names as keys and AnnData objects as values.

required
groupby str

Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping.

None
ngroups_cut int

Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None.

None

Returns:

Type Description
None

The input AnnData objects are updated in place, with non-HVGs filtered out and HVG-related annotation columns dropped from .var.

Preprocessing

mina.up.extract_metadata_from_obs(obs: pd.DataFrame, groupby: str, sort: bool = False) -> pd.DataFrame

Extract group-level metadata from an observation table.

Only columns with a single unique value per group are retained.

Parameters:

Name Type Description Default
obs DataFrame

Observation metadata (e.g., AnnData.obs).

required
groupby str

Column used to define groups.

required
sort bool

Whether to apply natural sorting to group identifiers.

False

Returns:

Type Description
DataFrame

Group-level metadata table.

mina.up.split_anndata_by_celltype(pdata, grouping='cell_type')

Split an AnnData object into multiple AnnData objects by cell type.

Parameters:

Name Type Description Default
pdata AnnData

Input AnnData object.

required
grouping str

Column in pdata.obs defining cell types.

'cell_type'

Returns:

Type Description
dict[str, AnnData]

Dictionary mapping cell types to AnnData objects.

mina.up.norm_log(anndata_dict, target_sum=1000000.0, exclude_highly_expressed=False, max_value=None, center=True)

Normalize, log-transform, and scale AnnData objects in place.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary of AnnData objects.

required
target_sum float

Target total count per sample after normalization.

1000000.0
exclude_highly_expressed bool

Whether to exclude highly expressed genes during normalization.

False
max_value float or None

Maximum value after scaling to clip outliers.

None
center bool

Whether to center features during scaling.

True

Returns:

Type Description
None

The input dictionary is modified in place.

Utils

mina.up.save_raw_counts(anndata_dict, layer_name='raw_counts')

Store raw count data in a layer for each AnnData object.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary of AnnData objects.

required
layer_name str

Name of the layer used to store raw counts.

'raw_counts'

Returns:

Type Description
None

The input dictionary is modified in place.

mina.up.append_view_to_var(anndata_dict, join=':')

Prefix feature names in each AnnData with its dict key and join separator.

This modifies the AnnData objects in-place. For example, if the key is "CM" and a gene is "gene1", the new var name becomes "CM:gene1" when join=":".

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary mapping views to AnnData objects.

required
join str

Separator used between view name and feature name. Default is ":"

':'

Returns:

Type Description
None

Updates .var_names in place.

mina.up.merge_adata_views(studies: list[dict[str, AnnData]], study_names: list[str], view_mode: str = 'union', min_view_studies: int = 2, var_mode: str = 'outer', min_var_studies: int = 2) -> dict[str, AnnData]

Merge multiple study-level AnnData dictionaries into unified views.

Parameters:

Name Type Description Default
studies list[dict[str, AnnData]]

List of study dictionaries, each mapping view names to AnnData objects.

required
study_names list[str]

Unique identifiers for each study. Must align with studies.

required
view_mode ``{'union', 'intersection', 'min_n'}``

Strategy for selecting views across studies.

'union'
min_view_studies int

Minimum number of studies required when view_mode='min_n'.

2
var_mode ``{'inner', 'outer', 'min_n'}``

Strategy for merging variables (features).

'outer'
min_var_studies int

Minimum number of studies required when var_mode='min_n'.

2
Assumptions

note:: Observation columns are harmonized across studies. Observation names are unique across studies. Feature names are harmonized across studies. View names are consistent across studies. study_names uniquely identify studies.

Returns:

Name Type Description
merged dict[str, AnnData]

Dictionary of merged AnnData objects, one per retained view.

Keys Each key corresponds to a view (modality/cell type) retained according to view_mode across the input studies.

Values Each value is an AnnData object resulting from concatenating the corresponding AnnData objects from all studies that contain that view. Guarantees:

- `.obs` columns: only columns present in all contributing studies
  are retained (strict intersection).
- `.obs_names` (row identifiers): all original observation names 
  are preserved; duplicates across studies are not allowed.
- `.obs["study"]`: column indicating the study of origin for each
  observation, using the names provided in ``study_names``.
- `.var` columns (features):
    * ``"inner"`` → only variables present in all contributing studies
    * ``"outer"`` → all variables present in at least one contributing study
    * ``"min_n"`` → variables present in at least ``min_var_studies`` studies
- `.uns` and other metadata are merged conservatively with unique keys.
- The resulting AnnData objects are copies; modifying them will 
  not affect the original input studies.