Skip to contents

and Generate an UpSet Plot This function compares gene and/or metabolite features across multiple prior knowledge (PK) resources or, if a single resource is provided with a vector of column names in metadata_info, compares columns within that resource. In the multi-resource mode, each element in data represents a PK resource (either as a data frame or a recognized resource name) from which a set of features is extracted. A binary summary table is then constructed and used to create an UpSet plot. In the within-resource mode, a single data frame is provided (with data containing one element) and its metadata_info entry is a vector of column names to compare (e.g., binary indicators for different annotations). In this case, the function expects the data frame to have a grouping column named "Class" (or, alternatively, a column specified via the class_col attribute in metadata_info) that is used for grouping in the UpSet plot.

Usage

compare_pk(
  data,
  metadata_info = NULL,
  filter_by = c("both", "gene", "metabolite"),
  plot_name = "Overlap of Prior Knowledge Resources",
  name_col = "TrivialName",
  palette_type = "polychrome",
  save_plot = "svg",
  save_table = "csv",
  print_plot = TRUE,
  path = NULL
)

Arguments

data

A named list where each element corresponds to a prior knowledge (PK) resource. Each element can be:

  • A data frame containing gene/metabolite identifiers (and additional columns for within-resource comparison),

  • A character string indicating the resource name. Recognized names include (but are not limited to): "Hallmarks", "Gaude", "MetalinksDB", and "RAMP" (or "metsigdb_chemicalclass"). In the latter case, the function will attempt to load the corresponding data automatically.

metadata_info

A named list (with names matching those in data) where each element is either a character string or a character vector indicating the column name(s) to extract features. For multiple-resource comparisons, these refer to the columns containing feature identifiers. For within-resource comparisons, the vector should list the columns to compare (e.g., c("CHEBI", "HMDB", "LIMID")). In within-resource mode, the input data frame is expected to contain a column named "Class" (or a grouping column specified via the class_col attribute). If no grouping column is found, a default grouping column named "Group" (with all rows assigned the same value) is created.

filter_by

Character. Optional filter for the resulting features when comparing multiple resources. Options are: "both" (default), "gene", or "metabolite". This parameter is ignored in within-resource mode.

plot_name

Optional: String which is added to the output files of the Upsetplot Default = ""

name_col

Optional: column name including the feature names. Default is "TrivialName".

palette_type

Character. Color palette to be used in the plot. Default is "polychrome".

save_plot

Optional: Select the file type of output plots. Options are svg, png, pdf. Default = svg

save_table

Optional: File types for the analysis results are: "csv", "xlsx", "txt". Default = "csv"

print_plot

Optional: TRUE or FALSE, if TRUE Volcano plot is saved as an overview of the results. Default = TRUE

path

Optional: Path to the folder the results should be saved at. Default = NULL

Value

A list containing two elements:

  • summary_table: A data frame representing either:

    • the binary summary matrix of feature presence/absence across multiple resources, or

    • the original data frame (augmented with binary columns and a None column) in within-resource mode.

  • upset_plot: The UpSet plot object generated by the function.

Examples

if (FALSE) { # \dontrun{
## Example 1: Within-Resource Comparison
## (Comparing Columns Within a Single data Frame)

# biocrates_features is a data frame with columns:
# "TrivialName", "CHEBI", "HMDB", "LIMID", and "Class".
# Here the "Class" column is used as the grouping variable
# in the UpSet plot.
data(biocrates_features)
data_single <- list(Biocft = biocrates_features)
metadata_info_single <- list(Biocft = c("CHEBI", "HMDB", "LIMID"))

res_single <-
compare_pk(
data = data_single, metadata_info = metadata_info_single,
plot_name = "Overlap of BioCrates Columns"
)

## Example 2: Custom data Frames with Custom Column Names

# Example with preloaded data frames and custom column names:
hallmarks_df <- data.frame(
feature = c("HMDB0001", "GENE1", "GENE2"),
stringsAsFactors = FALSE
)
gaude_df <- data.frame(
feature = c("GENE2", "GENE3"),
stringsAsFactors = FALSE
)
metalinks_df <- data.frame(
hmdb = c("HMDB0001", "HMDB0002"),
gene_symbol = c("GENE1", "GENE4"),
stringsAsFactors = FALSE
)
ramp_df <- data.frame(
class_source_id = c("HMDB0001", "HMDB0003"),
stringsAsFactors = FALSE
)
data <- list(
Hallmarks = hallmarks_df, Gaude = gaude_df,
MetalinksDB = metalinks_df, RAMP = ramp_df
)
metadata_info <- list(
Hallmarks = "feature", Gaude = "feature",
MetalinksDB = c("hmdb", "gene_symbol"),
RAMP = "class_source_id"
)
res <- compare_pk(
data = data, metadata_info = metadata_info,
filter_by = "metabolite"
)
} # }