Skip to content

Upstream

Filtering

mina.up.filter_anndata_by_ncells(anndata_dict, min_cells)

Filter samples by the number of cells in .obs['psbulk_cells'].

Updates the .var attribute with total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with AnnData objects as values.

required
min_cells int or dict[str, int]

If int, the same minimum number of cells is applied to all AnnData objects. If dict, must have the same keys as anndata_dict, where each value is the minimum number of cells for that dataset.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_views_by_samples(anndata_dict, min_rows)

Filter views with fewer samples than the specified threshold.

Also updates the .var attribute to include total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with AnnData objects as values.

required
min_rows int

Minimum number of rows required for an AnnData object to remain in the dictionary.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_genes_byexpr(anndata_dict, min_count, min_prop)

Filter genes by expression count prevalence within each view.

Also updates the .var attribute with total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
min_count int

Minimum count threshold for filtering genes.

required
min_prop float

Minimum proportion of samples (rows) where the count is >= min_count.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_views_by_genes(anndata_dict, min_genes_per_view)

Drop views with fewer genes than the specified threshold.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
min_genes_per_view int

Minimum number of genes (columns) that must remain in an AnnData object for it to be kept.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_samples_by_coverage(anndata_dict, threshold, min_prop)

Filter samples by the proportion of genes above a coverage threshold.

Updates the dictionary in place and updates the .var attribute with total counts per gene.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
threshold float or dict[str, float]

Count threshold a gene value must exceed to be considered. If a dict, keys must match anndata_dict.

required
min_prop float or dict[str, float]

Minimum proportion of genes that must exceed the threshold for a sample to be kept. If a dict, keys must match anndata_dict.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.filter_genes_by_celltype(anndata_dict, gene_lists)

Exclude view-specific gene lists from AnnData objects.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with cell types as keys and AnnData objects as values.

required
gene_lists dict[str, list[str]]

Dictionary with cell types as keys and lists of genes to exclude.

required

Returns:

Type Description
None

The function modifies the input AnnData objects in place.

mina.up.filter_smpls_by_nview(anndata_dict, min_views)

Filter samples that do not appear in enough views.

A sample (identified by its .obs.index) is kept only if it is present in min_views or more AnnData objects (views). The input dictionary is updated in place, with each AnnData object subset to the eligible samples.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with view or cell-type names as keys and AnnData objects as values. Sample identifiers are taken from adata.obs.index and must be comparable across views.

required
min_views int

Minimum number of views in which a sample must be present to be retained.

required

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.get_hvgs(anndata_dict, groupby=None, ngroups_cut=2)

Identify genes to exclude for each AnnData object, based on HVG masking.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with view/cell-type keys and AnnData objects as values.

required
groupby str

Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping.

None
ngroups_cut int

Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None.

2

Returns:

Type Description
dict[str, list[str]]

Dictionary with cell types as keys and lists of not variable genes to be excluded.

mina.up.filter_hvgs(anndata_dict, groupby=None, ngroups_cut=None)

Identify highly variable genes (HVGs) for each AnnData object and filter out non-HVGs.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with view or cell-type names as keys and AnnData objects as values.

required
groupby str

Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping.

None
ngroups_cut int

Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None.

None

Returns:

Type Description
None

The input AnnData objects are updated in place, with non-HVGs filtered out and HVG-related annotation columns dropped from .var.

Preprocessing

mina.up.extract_metadata_from_obs(obs: pd.DataFrame, groupby: str, sort: bool = False) -> pd.DataFrame

Extract group-level metadata from an observation table.

Only columns with a single unique value per group are retained.

Parameters:

Name Type Description Default
obs DataFrame

Observation metadata (e.g., AnnData.obs).

required
groupby str

Column used to define groups.

required
sort bool

Whether to apply natural sorting to group identifiers.

False

Returns:

Type Description
DataFrame

Group-level metadata table.

mina.up.split_anndata_by_celltype(pdata, grouping='cell_type')

Split an AnnData object into multiple AnnData objects by cell type.

Parameters:

Name Type Description Default
pdata AnnData

Input AnnData object.

required
grouping str

Column in pdata.obs defining cell types.

'cell_type'

Returns:

Type Description
dict[str, AnnData]

Dictionary mapping cell types to AnnData objects.

mina.up.norm_log(anndata_dict, target_sum=1000000.0, exclude_highly_expressed=False, max_value=None, center=True)

Normalize, log-transform, and scale AnnData objects in place.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary of AnnData objects.

required
target_sum float

Target total count per sample after normalization.

1000000.0
exclude_highly_expressed bool

Whether to exclude highly expressed genes during normalization.

False
max_value float or None

Maximum value after scaling to clip outliers.

None
center bool

Whether to center features during scaling.

True

Returns:

Type Description
None

The input dictionary is modified in place.

Utils

mina.up.save_raw_counts(anndata_dict, layer_name='raw_counts')

Store raw count data in a layer for each AnnData object.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary of AnnData objects.

required
layer_name str

Name of the layer used to store raw counts.

'raw_counts'

Returns:

Type Description
None

The input dictionary is modified in place.

mina.up.append_view_to_var(anndata_dict, join=':')

Prefix feature names in each AnnData with its dict key and join separator.

This modifies the AnnData objects in-place. For example, if the key is "CM" and a gene is "gene1", the new var name becomes "CM:gene1" when join=":".

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary mapping views to AnnData objects.

required
join str

Separator used between view name and feature name. Default is ":"

':'

Returns:

Type Description
None

Updates .var_names in place.

mina.up.merge_adata_views(studies: list[dict[str, AnnData]], study_names: list[str], view_mode: str = 'union', min_view_studies: int = 2, var_mode: str = 'outer', min_var_studies: int = 2) -> dict[str, AnnData]

Merge multiple study-level AnnData dictionaries into unified views.

Parameters:

Name Type Description Default
studies list[dict[str, AnnData]]

List of study dictionaries, each mapping view names to AnnData objects.

required
study_names list[str]

Unique identifiers for each study. Must align with studies.

required
view_mode ``{'union', 'intersection', 'min_n'}``

Strategy for selecting views across studies.

'union'
min_view_studies int

Minimum number of studies required when view_mode='min_n'.

2
var_mode ``{'inner', 'outer', 'min_n'}``

Strategy for merging variables (features).

'outer'
min_var_studies int

Minimum number of studies required when var_mode='min_n'.

2
Assumptions

note:: Observation columns are harmonized across studies. Observation names are unique across studies. Feature names are harmonized across studies. View names are consistent across studies. study_names uniquely identify studies.

Returns:

Name Type Description
merged dict[str, AnnData]

Dictionary of merged AnnData objects, one per retained view.

Keys Each key corresponds to a view (modality/cell type) retained according to view_mode across the input studies.

Values Each value is an AnnData object resulting from concatenating the corresponding AnnData objects from all studies that contain that view. Guarantees:

- `.obs` columns: only columns present in all contributing studies
  are retained (strict intersection).
- `.obs_names` (row identifiers): all original observation names
  are preserved; duplicates across studies are not allowed.
- `.obs["study"]`: column indicating the study of origin for each
  observation, using the names provided in ``study_names``.
- `.var` columns (features):
    * ``"inner"`` → only variables present in all contributing studies
    * ``"outer"`` → all variables present in at least one contributing study
    * ``"min_n"`` → variables present in at least ``min_var_studies`` studies
- `.uns` and other metadata are merged conservatively with unique keys.
- The resulting AnnData objects are copies; modifying them will
  not affect the original input studies.

mina.up.convert_views_to_functions(anndata_dict, net, tmin=5)

Apply decoupler ULM to each AnnData object with the provided network.

Rewrites the input dictionary in place, replacing each AnnData object with the result of the decoupler analysis.

Parameters:

Name Type Description Default
anndata_dict dict[str, AnnData]

Dictionary with AnnData objects as values.

required
net DataFrame

Long-format (tidy) DataFrame representing a network, where each row defines an interaction between a source and a target.

Required columns: - source: identifier of the source node - target: identifier of the target node

Optional columns: - weight: numeric value representing interaction strength

required
tmin int

Minimum number of targets required per source. Sources with fewer than tmin associated targets are filtered out.

5

Returns:

Type Description
None

The function modifies the input dictionary in place.

mina.up.make_membership_matrix(adata, pathways_df, gene_col='genesymbol', pathway_col='pathway')

Build a boolean gene × pathway membership matrix aligned with adata.var.index.

Parameters:

Name Type Description Default
adata AnnData

AnnData object with genes in .var.index

required
pathways_df DataFrame

Long-format DataFrame with at least two columns: gene_col and pathway_col

required
gene_col str

Column in pathways_df containing gene names

'genesymbol'
pathway_col str

Column in pathways_df containing pathway names

'pathway'

Returns:

Type Description
DataFrame

Boolean DataFrame (rows=adata.var.index, columns=unique pathways)

mina.up.get_nhood_enrichment_feats(adata, metadata, sample_key: str = 'biosample_id', cluster_key: str = 'celltype', spatial_key: str = 'spatial', coord_type: str = 'generic', n_perms: int = 1000, diagonal: bool = True, symmetric: bool = True, fillna: float | None = 0.0)

Build sample-level neighborhood enrichment features.

Parameters:

Name Type Description Default
adata AnnData

AnnData object containing spatial coordinates and observation metadata.

required
metadata DataFrame

Sample-level metadata indexed by sample_key.

required
sample_key str

Column in adata.obs defining samples/patients.

'biosample_id'
cluster_key str

Column in adata.obs defining cell types or clusters.

'celltype'
spatial_key str

Key in adata.obsm containing spatial coordinates.

'spatial'
coord_type str

Coordinate type passed to squidpy.gr.spatial_neighbors.

'generic'
n_perms int

Number of permutations for neighborhood enrichment.

1000
diagonal bool

Whether to keep same-celltype features, e.g. T__T.

True
symmetric bool

Whether to keep only one triangle of the celltype-pair matrix.

True
fillna float or None

Value used to replace NaN z-scores. Set to None to keep NaNs.

0.0

Returns:

Name Type Description
spatial_interaction_adata AnnData

AnnData object with samples as observations and celltype-pair neighborhood enrichment z-scores as variables.

mina.up.get_cell_props(adata: AnnData, sample_key: str, cell_type_key: str, metadata: pd.DataFrame) -> AnnData

Build sample-level center-log-ratio cell type composition features.

Parameters:

Name Type Description Default
adata AnnData

AnnData object with adata.obs containing sample and cell type or grouping information.

required
sample_key str

Column in adata.obs defining samples/patients.

required
cell_type_key str

Column in adata.obs defining cell types or clusters.

required
metadata DataFrame

DataFrame containing metadata for the samples, indexed by sample_key.

required

Returns:

Name Type Description
clr_props_adata AnnData

AnnData object with samples as observations and center-log-ratio transformed cell type proportions as variables.