Upstream
Filtering
mina.up.filter_anndata_by_ncells(anndata_dict, min_cells)
Filters out samples (rows) from AnnData objects in the dictionary where the number of cells (psbulk_cells) in .obs
is less than the specified threshold.
Updates the .var attribute with total counts per gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with AnnData objects as values. |
required |
min_cells
|
int or dict
|
|
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input dictionary in place. |
mina.up.filter_views_by_samples(anndata_dict, min_rows)
Filters out AnnData objects in the dictionary that have fewer samples (rows) than the specified threshold.
Also updates the .var attribute to include total counts per gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with AnnData objects as values. |
required |
min_rows
|
int
|
Minimum number of rows required for an AnnData object to remain in the dictionary. |
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input dictionary in place. |
mina.up.filter_genes_byexpr(anndata_dict, min_count, min_prop)
Filters genes in AnnData objects in the given dictionary based on count proportions, keeping all rows and filtering columns.
Also updates the .var attribute with total counts per gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with cell types as keys and AnnData objects as values. |
required |
min_count
|
int
|
Minimum count threshold for filtering genes. |
required |
min_prop
|
float
|
Minimum proportion of samples (rows) where the count is >= min_count. |
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input dictionary in place. |
mina.up.filter_views_by_genes(anndata_dict, min_genes_per_view)
Drops AnnData objects from the dictionary that have fewer than the specified number of genes (columns) after filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with cell types as keys and AnnData objects as values. |
required |
min_genes_per_view
|
int
|
Minimum number of genes (columns) that must remain in an AnnData object for it to be kept. |
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input dictionary in place. |
mina.up.filter_samples_by_coverage(anndata_dict, threshold, min_prop)
Filters out samples in AnnData objects that do not have a sufficient proportion of genes with values greater than a specified threshold.
Updates the dictionary in place and updates the .var attribute with total counts per gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with cell types as keys and AnnData objects as values. |
required |
threshold
|
float
|
The count threshold a gene value must exceed to be considered. Normally left at 0. |
required |
min_prop
|
float
|
Minimum proportion of genes that must exceed the threshold for a sample to be kept. |
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input dictionary in place. |
mina.up.filter_genes_by_celltype(anndata_dict, gene_lists)
Filters out genes from AnnData objects based on provided lists of genes to exclude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with cell types as keys and AnnData objects as values. |
required |
gene_lists
|
dict[str, list[str]]
|
Dictionary with cell types as keys and lists of genes to exclude. |
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input AnnData objects in place. |
mina.up.filter_smpls_by_nview(anndata_dict, min_views)
Filters out samples in AnnData objects that do not appear in at least a minimum number of views.
A sample (identified by its .obs.index) is kept only if it is present in
min_views or more AnnData objects (views). The input dictionary is updated
in place, with each AnnData object subset to the eligible samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with view or cell-type names as keys and AnnData objects as values.
Sample identifiers are taken from |
required |
min_views
|
int
|
Minimum number of views in which a sample must be present to be retained. |
required |
Returns:
| Type | Description |
|---|---|
None
|
The function modifies the input dictionary in place. |
mina.up.get_hvgs(anndata_dict, groupby=None, ngroups_cut=2)
Identify genes to exclude for each AnnData object, based on HVG masking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with view/cell-type keys and AnnData objects as values. |
required |
groupby
|
str
|
Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping. |
None
|
ngroups_cut
|
int
|
Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None. |
2
|
Returns:
| Type | Description |
|---|---|
dict[str, list[str]]
|
Dictionary with cell types as keys and lists of not variable genes to be excluded. |
mina.up.filter_hvgs(anndata_dict, groupby=None, ngroups_cut=None)
Identify highly variable genes (HVGs) for each AnnData object and filter out non-HVGs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary with view or cell-type names as keys and AnnData objects as values. |
required |
groupby
|
str
|
Column name in .obs to group by when identifying HVGs. If None, HVGs are identified without grouping. |
None
|
ngroups_cut
|
int
|
Minimum number of groups (batches) in which a gene must be highly variable to be retained. Only applicable if groupby is not None. |
None
|
Returns:
| Type | Description |
|---|---|
None
|
The input AnnData objects are updated in place, with non-HVGs filtered out and HVG-related annotation columns dropped from .var. |
Preprocessing
mina.up.extract_metadata_from_obs(obs: pd.DataFrame, groupby: str, sort: bool = False) -> pd.DataFrame
Extract group-level metadata from an observation table.
Only columns with a single unique value per group are retained.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
DataFrame
|
Observation metadata (e.g., |
required |
groupby
|
str
|
Column used to define groups. |
required |
sort
|
bool
|
Whether to apply natural sorting to group identifiers. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Group-level metadata table. |
mina.up.split_anndata_by_celltype(pdata, grouping='cell_type')
Split an AnnData object into multiple AnnData objects by cell type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdata
|
AnnData
|
Input AnnData object. |
required |
grouping
|
str
|
Column in |
'cell_type'
|
Returns:
| Type | Description |
|---|---|
dict[str, AnnData]
|
Dictionary mapping cell types to AnnData objects. |
mina.up.norm_log(anndata_dict, target_sum=1000000.0, exclude_highly_expressed=False, max_value=None, center=True)
Normalize, log-transform, and scale AnnData objects in place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary of AnnData objects. |
required |
target_sum
|
float
|
Target total count per sample after normalization. |
1000000.0
|
exclude_highly_expressed
|
bool
|
Whether to exclude highly expressed genes during normalization. |
False
|
max_value
|
float or None
|
Maximum value after scaling to clip outliers. |
None
|
center
|
bool
|
Whether to center features during scaling. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
The input dictionary is modified in place. |
Utils
mina.up.save_raw_counts(anndata_dict, layer_name='raw_counts')
Store raw count data in a layer for each AnnData object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary of AnnData objects. |
required |
layer_name
|
str
|
Name of the layer used to store raw counts. |
'raw_counts'
|
Returns:
| Type | Description |
|---|---|
None
|
The input dictionary is modified in place. |
mina.up.append_view_to_var(anndata_dict, join=':')
Prefix feature names in each AnnData with its dict key and join separator.
This modifies the AnnData objects in-place. For example, if the key is "CM" and a gene is "gene1", the new var name becomes "CM:gene1" when join=":".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anndata_dict
|
dict[str, AnnData]
|
Dictionary mapping views to AnnData objects. |
required |
join
|
str
|
Separator used between view name and feature name. Default is ":" |
':'
|
Returns:
| Type | Description |
|---|---|
None
|
Updates |
mina.up.merge_adata_views(studies: list[dict[str, AnnData]], study_names: list[str], view_mode: str = 'union', min_view_studies: int = 2, var_mode: str = 'outer', min_var_studies: int = 2) -> dict[str, AnnData]
Merge multiple study-level AnnData dictionaries into unified views.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
studies
|
list[dict[str, AnnData]]
|
List of study dictionaries, each mapping view names to AnnData objects. |
required |
study_names
|
list[str]
|
Unique identifiers for each study. Must align with |
required |
view_mode
|
``{'union', 'intersection', 'min_n'}``
|
Strategy for selecting views across studies. |
'union'
|
min_view_studies
|
int
|
Minimum number of studies required when |
2
|
var_mode
|
``{'inner', 'outer', 'min_n'}``
|
Strategy for merging variables (features). |
'outer'
|
min_var_studies
|
int
|
Minimum number of studies required when |
2
|
Assumptions
note::
Observation columns are harmonized across studies.
Observation names are unique across studies.
Feature names are harmonized across studies.
View names are consistent across studies.
study_names uniquely identify studies.
Returns:
| Name | Type | Description |
|---|---|---|
merged |
dict[str, AnnData]
|
Dictionary of merged AnnData objects, one per retained view. Keys
Each key corresponds to a view (modality/cell type) retained
according to Values Each value is an AnnData object resulting from concatenating the corresponding AnnData objects from all studies that contain that view. Guarantees: |