Compare Prior Knowledge Resources and/or Columns within a Single Resource and Generate an UpSet Plot

This function compares gene and/or metabolite features across multiple prior knowledge (PK) resources or, if a single resource is provided with a vector of column names in metadata_info, compares columns within that resource.

Usage

compare_pk(
  data,
  metadata_info = NULL,
  filter_by = c("both", "gene", "metabolite"),
  plot_name = "Overlap of Prior Knowledge Resources",
  name_col = "TrivialName",
  palette_type = "polychrome",
  save_plot = "svg",
  save_table = "csv",
  print_plot = TRUE,
  path = NULL
)

Arguments

data

A named list where each element corresponds to a prior knowledge (PK) resource. Each element can be:

A data frame containing gene/metabolite identifiers (and additional columns for within-resource comparison),
A character string indicating the resource name. Recognized names include (but are not limited to): "Hallmarks", "Gaude", "MetalinksDB", and "RAMP" (or "metsigdb_chemicalclass"). In the latter case, the function will attempt to load the corresponding data automatically.

metadata_info

A named list (with names matching those in data) where each element is either a character string or a character vector indicating the column name(s) to extract features. For multiple-resource comparisons, these refer to the columns containing feature identifiers. For within-resource comparisons, the vector should list the columns to compare (e.g., c("CHEBI", "HMDB", "LIMID")). In within-resource mode, the input data frame is expected to contain a column named "Class" (or a grouping column specified via the class_col attribute). If no grouping column is found, a default grouping column named "Group" (with all rows assigned the same value) is created.

filter_by

Character. Optional filter for the resulting features when comparing multiple resources. Options are: "both" (default), "gene", or "metabolite". This parameter is ignored in within-resource mode.

plot_name

Optional: String which is added to the output files of the Upsetplot Default = ""

palette_type

Character. Color palette to be used in the plot. Default is "polychrome".

save_plot

Optional: Select the file type of output plots. Options are svg, png, pdf. Default = svg

save_table

Optional: File types for the analysis results are: "csv", "xlsx", "txt". Default = "csv"

print_plot

Optional: TRUE or FALSE, if TRUE Volcano plot is saved as an overview of the results. Default = TRUE

path

Optional: Path to the folder the results should be saved at. Default = NULL

output_file

Character. Optional file path to save the generated plot; if NULL, the plot is not saved.

Value

A list containing two elements:

summary_table: A data frame representing either:
- the binary summary matrix of feature presence/absence across multiple resources, or
- the original data frame (augmented with binary columns and a None column) in within-resource mode.
upset_plot: The UpSet plot object generated by the function.

Details

In the multi-resource mode, each element in data represents a PK resource (either as a data frame or a recognized resource name) from which a set of features is extracted. A binary summary table is then constructed and used to create an UpSet plot.

In the within-resource mode, a single data frame is provided (with data containing one element) and its metadata_info entry is a vector of column names to compare (e.g., binary indicators for different annotations). In this case, the function expects the data frame to have a grouping column named "Class" (or, alternatively, a column specified via the class_col attribute in metadata_info) that is used for grouping in the UpSet plot.

Examples

## Example 1: Multi-Resource Comparison

# Using automatic data loading for multiple resources.
data <- list(hallmarks = hallmarks, gaude = gaude_pathways,
                metalinksdb = metsigdb_metalinks(), ramp = metsigdb_chemicalclass())

# Filtering to include only gene features:
res_genes <- MetaProViz::compare_pk(data = data, filter_by = "gene")
#> Error in MetaProViz::compare_pk(data = data, filter_by = "gene"): Column(s) feature not found in resource hallmarks

## Example 2: Within-Resource Comparison (Comparing Columns Within a Single data Frame)

# biocrates_features is a data frame with columns: "TrivialName", "CHEBI", "HMDB", "LIMID", and "Class".
# Here the "Class" column is used as the grouping variable in the UpSet plot.
data_single <- list(Biocft = biocrates_features)
metadata_info_single <- list(Biocft = c("CHEBI", "HMDB", "LIMID"))

res_single <- MetaProViz::compare_pk(data = data_single, metadata_info = metadata_info_single,
                          plot_name = "Overlap of BioCrates Columns")


## Example 3: Custom data Frames with Custom Column Names

# Example with preloaded data frames and custom column names:
hallmarks_df <- data.frame(feature = c("HMDB0001", "GENE1", "GENE2"), stringsAsFactors = FALSE)
gaude_df <- data.frame(feature = c("GENE2", "GENE3"), stringsAsFactors = FALSE)
metalinks_df <- data.frame(hmdb = c("HMDB0001", "HMDB0002"),
                           gene_symbol = c("GENE1", "GENE4"), stringsAsFactors = FALSE)
ramp_df <- data.frame(class_source_id = c("HMDB0001", "HMDB0003"), stringsAsFactors = FALSE)
data <- list(Hallmarks = hallmarks_df, Gaude = gaude_df,
                MetalinksDB = metalinks_df, RAMP = ramp_df)
metadata_info <- list(Hallmarks = "feature", Gaude = "feature",
                     MetalinksDB = c("hmdb", "gene_symbol"), RAMP = "class_source_id")
res <- MetaProViz::compare_pk(data = data, metadata_info = metadata_info, filter_by = "metabolite")