Upstream

Filtering

`mina.up.filter_anndata_by_ncells(anndata_dict, min_cells)`

Filters out samples (rows) from AnnData objects in the dictionary where the number of cells (psbulk_cells) in .obs is less than the specified threshold.

Updates the .var attribute with total counts per gene.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with AnnData objects as values.	required
`min_cells`	`int or dict`	If int, the same minimum number of cells is applied to all AnnData objects. If dict, must have the same keys as anndata_dict, where each value is the minimum number of cells for that dataset.	required

Returns:

Type	Description
`None`	The function modifies the input dictionary in place.

`mina.up.filter_views_by_samples(anndata_dict, min_rows)`

Filters out AnnData objects in the dictionary that have fewer samples (rows) than the specified threshold.

Also updates the .var attribute to include total counts per gene.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with AnnData objects as values.	required
`min_rows`	`int`	Minimum number of rows required for an AnnData object to remain in the dictionary.	required

Returns:

Type	Description
`None`	The function modifies the input dictionary in place.

`mina.up.filter_genes_byexpr(anndata_dict, min_count, min_prop)`

Filters genes in AnnData objects in the given dictionary based on count proportions, keeping all rows and filtering columns.

Also updates the .var attribute with total counts per gene.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with cell types as keys and AnnData objects as values.	required
`min_count`	`int`	Minimum count threshold for filtering genes.	required
`min_prop`	`float`	Minimum proportion of samples (rows) where the count is >= min_count.	required

Returns:

Type	Description
`None`	The function modifies the input dictionary in place.

`mina.up.filter_views_by_genes(anndata_dict, min_genes_per_view)`

Drops AnnData objects from the dictionary that have fewer than the specified number of genes (columns) after filtering.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with cell types as keys and AnnData objects as values.	required
`min_genes_per_view`	`int`	Minimum number of genes (columns) that must remain in an AnnData object for it to be kept.	required

Returns:

Type	Description
`None`	The function modifies the input dictionary in place.

`mina.up.filter_samples_by_coverage(anndata_dict, threshold, min_prop)`

Filters out samples in AnnData objects that do not have a sufficient proportion of genes with values greater than a specified threshold.

Updates the dictionary in place and updates the .var attribute with total counts per gene.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with cell types as keys and AnnData objects as values.	required
`threshold`	`float`	The count threshold a gene value must exceed to be considered. Normally left at 0.	required
`min_prop`	`float`	Minimum proportion of genes that must exceed the threshold for a sample to be kept.	required

Returns:

Type	Description
`None`	The function modifies the input dictionary in place.

`mina.up.filter_genes_by_celltype(anndata_dict, gene_lists)`

Filters out genes from AnnData objects based on provided lists of genes to exclude.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with cell types as keys and AnnData objects as values.	required
`gene_lists`	`dict[str, list[str]]`	Dictionary with cell types as keys and lists of genes to exclude.	required

Returns:

Type	Description
`None`	The function modifies the input AnnData objects in place.

`mina.up.filter_smpls_by_nview(anndata_dict, min_views)`

Filters out samples in AnnData objects that do not appear in at least a minimum number of views.

A sample (identified by its .obs.index) is kept only if it is present in min_views or more AnnData objects (views). The input dictionary is updated in place, with each AnnData object subset to the eligible samples.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with view or cell-type names as keys and AnnData objects as values. Sample identifiers are taken from `adata.obs.index` and must be comparable across views.	required
`min_views`	`int`	Minimum number of views in which a sample must be present to be retained.	required

Returns:

Type	Description
`None`	The function modifies the input dictionary in place.

`mina.up.get_hvgs(anndata_dict, groupby=None, ngroups_cut=2)`

Identify genes to exclude for each AnnData object, based on HVG masking.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with view/cell-type keys and AnnData objects as values.	required
`groupby`	`str`	Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping.	`None`
`ngroups_cut`	`int`	Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None.	`2`

Returns:

Type	Description
`dict[str, list[str]]`	Dictionary with cell types as keys and lists of not variable genes to be excluded.

`mina.up.filter_hvgs(anndata_dict, groupby=None, ngroups_cut=None)`

Identify highly variable genes (HVGs) for each AnnData object and filter out non-HVGs.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary with view or cell-type names as keys and AnnData objects as values.	required
`groupby`	`str`	Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping.	`None`
`ngroups_cut`	`int`	Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None.	`None`

Returns:

Type	Description
`None`	The input AnnData objects are updated in place, with non-HVGs filtered out and HVG-related annotation columns dropped from .var.

Preprocessing

`mina.up.extract_metadata_from_obs(obs: pd.DataFrame, groupby: str, sort: bool = False) -> pd.DataFrame`

Extract group-level metadata from an observation table.

Only columns with a single unique value per group are retained.

Parameters:

Name	Type	Description	Default
`obs`	`DataFrame`	Observation metadata (e.g., `AnnData.obs`).	required
`groupby`	`str`	Column used to define groups.	required
`sort`	`bool`	Whether to apply natural sorting to group identifiers.	`False`

Returns:

Type	Description
`DataFrame`	Group-level metadata table.

`mina.up.split_anndata_by_celltype(pdata, grouping='cell_type')`

Split an AnnData object into multiple AnnData objects by cell type.

Parameters:

Name	Type	Description	Default
`pdata`	`AnnData`	Input AnnData object.	required
`grouping`	`str`	Column in `pdata.obs` defining cell types.	`'cell_type'`

Returns:

Type	Description
`dict[str, AnnData]`	Dictionary mapping cell types to AnnData objects.

`mina.up.norm_log(anndata_dict, target_sum=1000000.0, exclude_highly_expressed=False, max_value=None, center=True)`

Normalize, log-transform, and scale AnnData objects in place.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary of AnnData objects.	required
`target_sum`	`float`	Target total count per sample after normalization.	`1000000.0`
`exclude_highly_expressed`	`bool`	Whether to exclude highly expressed genes during normalization.	`False`
`max_value`	`float or None`	Maximum value after scaling to clip outliers.	`None`
`center`	`bool`	Whether to center features during scaling.	`True`

Returns:

Type	Description
`None`	The input dictionary is modified in place.

Utils

`mina.up.save_raw_counts(anndata_dict, layer_name='raw_counts')`

Store raw count data in a layer for each AnnData object.

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary of AnnData objects.	required
`layer_name`	`str`	Name of the layer used to store raw counts.	`'raw_counts'`

Returns:

Type	Description
`None`	The input dictionary is modified in place.

`mina.up.append_view_to_var(anndata_dict, join=':')`

Prefix feature names in each AnnData with its dict key and join separator.

This modifies the AnnData objects in-place. For example, if the key is "CM" and a gene is "gene1", the new var name becomes "CM:gene1" when join=":".

Parameters:

Name	Type	Description	Default
`anndata_dict`	`dict[str, AnnData]`	Dictionary mapping views to AnnData objects.	required
`join`	`str`	Separator used between view name and feature name. Default is ":"	`':'`

Returns:

Type	Description
`None`	Updates `.var_names` in place.

`mina.up.merge_adata_views(studies: list[dict[str, AnnData]], study_names: list[str], view_mode: str = 'union', min_view_studies: int = 2, var_mode: str = 'outer', min_var_studies: int = 2) -> dict[str, AnnData]`

Merge multiple study-level AnnData dictionaries into unified views.

Parameters:

Name	Type	Description	Default
`studies`	`list[dict[str, AnnData]]`	List of study dictionaries, each mapping view names to AnnData objects.	required
`study_names`	`list[str]`	Unique identifiers for each study. Must align with `studies`.	required
`view_mode`	``{'union', 'intersection', 'min_n'}``	Strategy for selecting views across studies.	`'union'`
`min_view_studies`	`int`	Minimum number of studies required when `view_mode='min_n'`.	`2`
`var_mode`	``{'inner', 'outer', 'min_n'}``	Strategy for merging variables (features).	`'outer'`
`min_var_studies`	`int`	Minimum number of studies required when `var_mode='min_n'`.	`2`

Assumptions

note:: Observation columns are harmonized across studies. Observation names are unique across studies. Feature names are harmonized across studies. View names are consistent across studies. study_names uniquely identify studies.

Returns:

Name Type Description

merged

dict[str, AnnData]

Dictionary of merged AnnData objects, one per retained view.

Keys Each key corresponds to a view (modality/cell type) retained according to view_mode across the input studies.

Values Each value is an AnnData object resulting from concatenating the corresponding AnnData objects from all studies that contain that view. Guarantees:

- `.obs` columns: only columns present in all contributing studies
  are retained (strict intersection).
- `.obs_names` (row identifiers): all original observation names 
  are preserved; duplicates across studies are not allowed.
- `.obs["study"]`: column indicating the study of origin for each
  observation, using the names provided in ``study_names``.
- `.var` columns (features):
    * ``"inner"`` → only variables present in all contributing studies
    * ``"outer"`` → all variables present in at least one contributing study
    * ``"min_n"`` → variables present in at least ``min_var_studies`` studies
- `.uns` and other metadata are merged conservatively with unique keys.
- The resulting AnnData objects are copies; modifying them will 
  not affect the original input studies.