Skip to contents


A Consumption-Release (CoRe) metabolomics experiment usually refers to a cell culture experiment where metabolomics is performed on the cell culture media.

In this tutorial we showcase how to use MetaProViz:

  • To process raw peak data and identify outliers.
  • To perform differential metabolite analysis (DMA) to generate Log2Distance and statistics and perform pathway analysis using Over Representation Analysis (ORA) on the results.
  • To do metabolite clustering analysis (MCA) to find clusters of metabolites with similar behaviors and perform pathway analysis using ORA on each cluster.
  • To use specific visualizations to aid biological interpretation of the results.


    First if you have not done yet, install the required dependencies and load the libraries:
# 1. Install Rtools if you haven’t done this yet, using the appropriate version (e.g.windows or macOS).
# 2. Install the latest development version from GitHub using devtools
#devtools::install_github("https://github.com/saezlab/MetaProViz")

library(MetaProViz)
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) : 
#>   object 'type_sum.accel' not found

#dependencies that need to be loaded:
library(magrittr)
library(dplyr)
library(tibble)
library(rlang)
library(ggfortify)

#Please install the Biocmanager Dependencies:
#BiocManager::install("clusterProfiler")
#BiocManager::install("EnhancedVolcano")


1. Loading the example data

Here we choose an example datasets, which is publicly available on metabolomics workbench project PR001418 including metabolic profiles of human renal epithelial cells HK2 and cell renal cell carcinoma (ccRCC) cell lines cultured in Plasmax cell culture media. Here we use the integrated raw peak data as example data using the trivial metabolite name in combination with the KEGG ID as the metabolite identifiers.

As part of the MetaProViz package you can load the example data into your global environment using the function toy_data():

1. CoRe experiment (CoRe)
The raw data are available via metabolomics workbench study ST002226 were exometabolomics of HK2 and ccRCC cell lines 786-O, 786-M1A, 786-M2A, OS-RC-2, OS-LM1 and RFX-631 were performed.

Media <- MetaProViz::ToyData(Data="CultureMedia_Raw")
Preview of the DF CoRe including columns with sample information and metabolite ids with their measured values.
Conditions Biological_Replicates GrowthFactor valine-d8 hipppuric acid-d5 2-hydroxyglutarate 2-ketoglutarate 3-Dehydro-L-threonate
MS51-06 HK2 1 249.2817 780552871 3127630257 84950547 169244158 1807489245
MS51-07 HK2 2 249.2817 802602348 3256031922 60753859 151064767 695228424
MS51-08 HK2 3 249.2817 831984796 3308009345 73718363 171281531 791442407
MS51-09 HK2 4 249.2817 822744518 3209731571 65933166 112033043 315209589
MS51-10 HK2 5 249.2817 805565867 3297793480 68183576 170902744 615035216
MS51-11 786-O 1 297.3423 841873509 3418515398 75941661 215553005 501977089
MS51-12 786-O 2 297.3423 825462964 3218049751 62100210 195308040 484681099


2. Additional information mapping the trivial metabolite names to KEGG IDs and selected pathways (MappingInfo)

MappingInfo <- MetaProViz::ToyData(Data="Cells_MetaData")
Preview of the DF Pathways including the trivial metabolite identifiers used in the experiment as well as KEGG IDs and pathway information.
HMDB KEGG.ID KEGGCompound Pathway
N-acetylaspartate HMDB0000812 C01042 N-Acetyl-L-aspartate Alanine, aspartate and glutamate metabolism
argininosuccinate HMDB0000052 C03406 N-(L-Arginino)succinate Alanine, aspartate and glutamate metabolism
N-acetylaspartylglutamate HMDB0001067 C12270 N-Acetylaspartylglutamate Alanine, aspartate and glutamate metabolism
tyrosine HMDB0000158 C00082 L-Tyrosine Amino acid metabolism
asparagine HMDB0000168 C00152 L-Asparagine Amino acid metabolism


3. KEGG pathways that are loaded via KEGG API using the package KEGGREST and can be used to perform pathway analysis. (KEGG_Pathways)

#This will use KEGGREST to query the KEGG API to load the pathways:
MetaProViz::LoadKEGG()
#> Cached file loaded from: ~/.cache/KEGG_Metabolite.rds
Preview of the DF KEGG_Pathways.
term Metabolite MetaboliteID Description
Glycolysis / Gluconeogenesis - Homo sapiens (human) Pyruvate C00022 Glycolysis / Gluconeogenesis - Homo sapiens (human)
Glycolysis / Gluconeogenesis - Homo sapiens (human) Acetyl-CoA C00024 Glycolysis / Gluconeogenesis - Homo sapiens (human)
Glycolysis / Gluconeogenesis - Homo sapiens (human) D-Glucose C00031 Glycolysis / Gluconeogenesis - Homo sapiens (human)
Glycolysis / Gluconeogenesis - Homo sapiens (human) Acetate C00033 Glycolysis / Gluconeogenesis - Homo sapiens (human)
Glycolysis / Gluconeogenesis - Homo sapiens (human) Oxaloacetate C00036 Glycolysis / Gluconeogenesis - Homo sapiens (human)


2. Run MetaProViz Analysis

Currently, MetaProViz contains four different modules, which include different methods and can be used independently from each other or in combination (see introduction for more details). Here we will go trough each of those modules and apply them to the example data.

Pre-processing

MetaProViz includes a pre-processing module with the function Preprocessing() that has multiple parameters to perform customize data processing.
Feature_Filtering applies the 80%-filtering rule on the metabolite features either on the whole dataset (=“Standard”) (Bijlsma et al. 2006) or per condition (=“Modified”) (Wei et al. 2018). This means that metabolites are removed were more than 20% of the samples (all or per condition) have no detection. In case of the CoRe experiment, the blank samples are ignored during feature filtering, since often metabolites are released from a cell and not naturally present in the culture media leading to no detection in the blank. With the parameter Feature_Filt_Value we enable the adaptation of the stringency of the filtering based on the experimental context. For instance, patient tumour samples can contain many unknown subgroups due to gender, age, stage etc., which leads to a metabolite being detected in only 50% (or even less) of the tumour samples, hence in this context it could be considered to change the Feature_Filt_Value from the default (=0.8). If Feature_Filtering = "None", no feature filtering is performed. In the context of Feature_Filtering it is also noteworthy that the function Pool_Estimation() can be used to estimate the quality of the metabolite detection and will return a list of metabolites that are variable across the different pool measurements (pool = mixture of all experimental samples measured several times during the LC-MS run) . Variable metabolite in the pool sample should be removed from the data.
The parameter TIC_Normalization refers to Total Ion Count (TIC) normalisation, which is often used with LC-MS derived metabolomics data. If TIC_Normalization = TRUE, each feature (=metabolite) in a sample is divided by the sum of all intensity value (= total number of ions) for the sample and finally multiplied by a constant ( = the mean of all samples total number of ions). Noteworthy, TIC normalisation should not be used with small number of features (= metabolites), since TIC assumes that on “average” the ion count of each sample is equal if there were no instrument batch effects (Wulff and Mitchell 2018).
The parameter MVI refers to Missing Value Imputation (MVI) and if MVI = TRUE half minimum (HM) missing value imputation is performed per feature (= per metabolite). Here it is important to mention that HM has been shown to perform well for missing vales that are missing not at random (MNAR) (Wei et al. 2018).
Lastly, the function Preprocessing() performs outlier detection and adds a column “Outliers” into the DF, which can be used to remove outliers. The parameter HotellinsConfidence can be used to choose the confidence interval that should be used for the Hotellins T2 outlier test (Hotelling 1931).

Since our example data contains pool samples, we will do Pool_Estimation() before applying the Preprocessing() function. This is important, since one should remove the features (=metabolites) that are too variable prior to performing any data transformations such as TIC as part of the Preprocessing() function.
It is worth mentioning that the Coefficient of variation (CV) is calculated by dividing the standard deviation (SD) by the mean. Hence CV depends on the SD, which in turn works for normally distributed data.

Pool_Estimation_result<- MetaProViz::PoolEstimation(InputData = Media[,-c(1:3)],
                                                    SettingsFile_Sample = Media[,1:3],
                                                    SettingsInfo = c(PoolSamples = "Pool", Conditions="Conditions"),
                                                    CutoffCV = 30)

Pool_Estimation_result_DF_CV <-Pool_Estimation_result[["DF"]][["CV"]]




Preview of the Pool_Estimation result.
Metabolite CV HighVar MissingValuePercentage
valine-d8 1.744198 FALSE 0
hipppuric acid-d5 1.253983 FALSE 0
2-hydroxyglutarate 7.199141 FALSE 0
2-ketoglutarate 2.766989 FALSE 0
3-Dehydro-L-threonate 9.222780 FALSE 0


The results from the Pool_Estimation() is a table that has the CVs. If there is a high variability, one should consider to remove those features from the data. For the example data nothing needs to be removed. If you have used internal standard in your experiment you should specifically check their CV as this would indicate technical issues (here valine-d8 and hippuric acid-d5).

Now we will apply the Preprocessing() function to the example data and have a look at the output produced. You will notice that all the chosen parameters and results are documented in messages. All the results data tables, the Quality Control (QC) plots and outlier detection plots are returned and can be easily viewed. Importantly, here we are able to specify that we have a CoRe experiment setting the parameter CoRe=TRUE, in which case a few additional data processing steps are applied:
1. Blank sample: This refers to media samples where no cells have been cultured in, which will be used as blank. In detail, the mean of the blank sample of a feature (= metabolite) will be substracted from the values measured in each sample for the same feature. In the column “Condition” of the Experimental_design DF, you will need to label your blank samples with “blank”.
2. Growth factor or growth rate: This refers to the different conditions and is either based on cell count or protein quantification at the start of the experiment (t0) and at the end of the experiment (t1) resulting in the growth factor (t1/t0). Otherwise, one can experimentally estimate the growth rate of each condition. Ultimately, this measure is used to normalize the data, since the amount of growth will impact the consumption and release of metabolites from the media and hence we need to account for this. If you do not have this information, this will be set to 1, yet be aware that this may affect the results.

You can pass these additional information via the parameter Input_SettingsInfo, by passing the column name for the CoRe_norm_factor in the Input_SettingsFile and the condition name for the CoRe_media in the Input_data file.

#Prepare the input:
Media_input <- Media%>%
  subset(!Conditions=="Pool", select = -c(1:3))#remove pool samples and remove the information columns

Media_Metadata <- Media%>%
  subset(!Conditions=="Pool", select = c(1:3))#remove pool samples and keep the information columns only

PreProcessing_res <-  MetaProViz::PreProcessing(InputData=Media_input,
                                                SettingsFile_Sample =Media_Metadata,
                                                SettingsInfo = c(Conditions = "Conditions",
                                                                       Biological_Replicates = "Biological_Replicates",
                                                                       CoRe_norm_factor = "GrowthFactor",
                                                                       CoRe_media = "blank"),
                                                FeatureFilt = "Modified",
                                                FeatureFilt_Value = 0.8,
                                                TIC = TRUE,# As we have raw data we will perform total ion count norm
                                                MVI=TRUE, #We assume the values are not missing at random and perform half minimum MVI
                                                MVI_Percentage=50,
                                                HotellinsConfidence = 0.99,# We perform outlier testing using 0.99 confidence interval
                                                CoRe = TRUE)

# Now we can have a look at the results table:
Media_Preprocessed <-  PreProcessing_res[["DF"]][["Preprocessing_output"]]
#> For Consumption Release experiment we are using the method from Jain M.  REF: Jain et. al, (2012), Science 336(6084):1040-4, doi: 10.1126/science.1218595.
#> FeatureFiltering: Here we apply the modified 80%-filtering rule that takes the class information (Column `Conditions`) into account, which additionally reduces the effect of missing values (REF: Yang et. al., (2015), doi: 10.3389/fmolb.2015.00004). Filtering value selected: 0.8
#> 3 metabolites where removed: N-acetylaspartylglutamate, hypotaurine, S-(2-succinyl)cysteine
#> Missing Value Imputation: Missing value imputation is performed, as a complementary approach to address the missing value problem, where the missing values are imputing using the `half minimum value`. REF: Wei et. al., (2018), Reports, 8, 663, doi:https://doi.org/10.1038/s41598-017-19120-0
#> NA values were found in Control_media samples for metabolites. For metabolites including NAs MVI is performed unless all samples of a metabolite are NA.
#> Metabolites with high NA load (>20%) in Control_media samples are: dihydroorotate.
#> Metabolites with only NAs (=100%) in Control_media samples are: hydroxyphenylpyruvate. Those NAs are set zero as we consider them true zeros
#> Total Ion Count (TIC) normalization: Total Ion Count (TIC) normalization is used to reduce the variation from non-biological sources, while maintaining the biological variation. REF: Wulff et. al., (2018), Advances in Bioscience and Biotechnology, 9, 339-351, doi:https://doi.org/10.4236/abb.2018.98022
#> 8 of variables have high variability (CV > 30) in the CoRe_media control samples. Consider checking the pooled samples to decide whether to remove these metabolites or not.
#> Warning in CoReNorm(InputData = TICRes, SettingsFile_Sample =
#> SettingsFile_Sample, : The CoRe_media samples MS51-06 were found to be
#> different from the rest. They will not be included in the sum of the CoRe_media
#> samples.
#> CoRe data are normalised by substracting mean (blank) from each sample and multiplying with the CoRe_norm_factor
#> Outlier detection: Identification of outlier samples is performed using Hotellin's T2 test to define sample outliers in a mathematical way (Confidence = 0.99 ~ p.val < 0.01) (REF: Hotelling, H. (1931), Annals of Mathematical Statistics. 2 (3), 360–378, doi:https://doi.org/10.1214/aoms/1177732979). HotellinsConfidence value selected: 0.99
#> There are possible outlier samples in the data
#> Filtering round  1  Outlier Samples:  MS51-06
#> Filtering round  2  Outlier Samples:  MS51-09




Preview of the pre-processing results, which has an additional column Outlier including the results of Hotellins T2.
Conditions Biological_Replicates GrowthFactor Outliers valine-d8 hipppuric acid-d5 2-hydroxyglutarate 2-ketoglutarate 3-Dehydro-L-threonate
MS51-06 HK2 1 249.2817 Outlier_filtering_round_1 6622682287 26658050746 5853665424 7527471565 318856944286
MS51-07 HK2 2 249.2817 no 5049281744 30218506574 -868601575 1530912430 28673393701
MS51-08 HK2 3 249.2817 no -1441406385 -11561761745 1108346025 3684365990 39313101752
MS51-09 HK2 4 249.2817 Outlier_filtering_round_2 2563904070 -10237061776 -189633550 -9097094735 -67782920598
MS51-10 HK2 5 249.2817 no -6171779428 -8419683053 -50110899 3881816314 -203574855


In the output table you can now see the column “Outliers” and for the Condition HK2 CCM, we can see that based on Hotellin’s T2 test, samples were detected as outliers in the first and second round of filtering.
As part of the Preprocessing() function several plots are generated and saved. Additionally, the ggplots are returned into the list to enable further modifiaction using the ggplot syntax. These plots include plots showing the outliers for each filtering round and other QC plots.

As part of the MetaProViz visualization module one can easily further customize the PCA plot and adapt color and shape for the information of interest. You can see more below for the VizPCA() function.
Before we proceed, we will remove the outlier:

Media_Preprocessed <-Media_Preprocessed%>%
  subset(!Outliers=="Outlier_filtering_round_1")


In metabolomics, sometimes samples are injected (=measured) several times, which can be termed as analytical replicates. The MetaProViz pre-processing module includes the function ReplicateSum(), which will summarize those and save the results.

DMA

Differential Metabolite Analysis (DMA) between two conditions (e.g. Tumour versus Healthy) usually calculates the Log2FC, p-value, adjusted p-value and t-value. Yet, in a CoRe experiment the normalized metabolite values can be either a negative value, if the metabolite has been consumed from the media, or a positive value, if the metabolite has been released from the cell into the culture media. Since we can not calculate a Log2FC using negative values, we calculate the absolute difference between the mean of Condition 1 versus the mean of Condition 2. The absolute difference is log2 transformed in order to make the values comparable between the different metabolites, resulting in the Log2Dist. The result doesn’t consider whether one product is larger than the other; it only looks at the magnitude of their difference. To reflect the direction of change between the two conditions we multiply with -1 if C1 < C2. By setting the paramteter CoRe = TRUE, instead of calclulating the Log2FC, the Log2 Distance is calculated.
With the different parameters STAT_pval and STAT_padj one can choose the statistical tests such as t.test, wilcoxon test, limma, annova, kruskal walles, etc. (see function reference for more information).
As input one can use the pre-processed data we have generated using the Preprocessing module, but here one can of course use any DF including metabolite values, even though we recommend to normalize the data and remove outliers prior to DMA. Moreover, we require the Input_SettingsFile_Sample including the sample metadata with information which condition a sample corresponds to. Additionally, we enable the user to provide a Plot_SettingsFile_Metab containing the metadata for the features (metabolites), such as KEGG ID, pathway, retention time, etc.

By defining the numerator and denominator as part of the Input_SettingsInfo parameter, it is defined which comparisons are performed:
1. one_vs_one (single comparison): numerator=“Condition1”, denominator =“Condition2”
2. all_vs_one (multiple comparison): numerator=NULL, denominator =“Condition”
3. all_vs_all (multiple comparison): numerator=NULL, denominator =NULL (=default)

As input we will use the pre-processed data we have generated using the Preprocessing module, but here one can of course use any DF including metabolite values and information about the conditions that should be compared (even though we recommend to normalize the data and remove outliers prior to DMA).

In the example data we have seven different cell lines, healthy (HK2) and cancer (ccRCC: 786-M1A, 786-M2A, 786-O, OSRC2, OSLM1B and RFX631) and hence we can perform multiple different comparisons. The results can be automatically saved and all the results are returned in a list with the different data frames. If parameter Plot=TRUE, an overview Volcano plot is generated and saved.

# Perform multiple comparison All_vs_One using annova:
DMA_Annova <-  MetaProViz::DMA(InputData=Media_Preprocessed[,-c(1:6)],
                               SettingsFile_Sample=Media_Preprocessed[,c(1:4)],
                               SettingsInfo = c(Conditions="Conditions", Numerator=NULL, Denominator = "HK2"),
                               StatPval ="aov",
                               StatPadj="fdr",
                               SettingsFile_Metab = MappingInfo,
                               CoRe=TRUE)

#Inspect the DMA results tables:
DMA_786M1A_vs_HK2 <- DMA_Annova[["DMA"]][["786-M1A_vs_HK2"]]
Shapiro <- DMA_Annova[["ShapiroTest"]][["DF"]][["Shapiro_result"]]
#> There are no NA/0 values
#> For the condition HK2 82.35 % of the metabolites follow a normal distribution and 17.65 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-O 95.71 % of the metabolites follow a normal distribution and 4.29 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-M1A 97.14 % of the metabolites follow a normal distribution and 2.86 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-M2A 88.57 % of the metabolites follow a normal distribution and 11.43 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition OSRC2 92.86 % of the metabolites follow a normal distribution and 7.14 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition OSLM1B 85.71 % of the metabolites follow a normal distribution and 14.29 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition RFX631 97.14 % of the metabolites follow a normal distribution and 2.86 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For 67.65% of metabolites the group variances are equal.
#> We added +1 to the mean value of metabolite(s) , since the mean of the replicate values where 0. This was not due to missing values (NA/0).
#> We added +1 to the mean value of metabolite(s) , since the mean of the replicate values where 0. This was not due to missing values (NA/0).
#> We added +1 to the mean value of metabolite(s) , since the mean of the replicate values where 0. This was not due to missing values (NA/0).
#> We added +1 to the mean value of metabolite(s) , since the mean of the replicate values where 0. This was not due to missing values (NA/0).
#> We added +1 to the mean value of metabolite(s) , since the mean of the replicate values where 0. This was not due to missing values (NA/0).
#> We added +1 to the mean value of metabolite(s) , since the mean of the replicate values where 0. This was not due to missing values (NA/0).
#> No condition was specified as numerator and HK2 was selected as a denominator. Performing multiple testing `all-vs-one` using aov.




  1. Preview of the Shaprio results for the different conditions.
Code Metabolites with normal distribution [%] Metabolites with not-normal distribution [%] Shapiro p.val(2-hydroxyglutarate) Shapiro p.val(2-ketoglutarate)
HK2 82.35 17.65 0.6833007 0.0446492
786-O 95.71 4.29 0.4938675 0.3823712
786-M1A 97.14 2.86 0.9979050 0.1384610
786-M2A 88.57 11.43 0.5546558 0.5369470
OSRC2 92.86 7.14 0.8899005 0.2007242
OSLM1B 85.71 14.29 0.4643014 0.9022803
RFX631 97.14 2.86 0.9292099 0.0247568
  1. Preview of the DMA results for the comparison of 786-M1A versus HK2 cells.
Metabolite Log2(Distance) p.adj t.val Mean_786-M1A CoRe_786-M1A Mean_HK2 CoRe_HK2 CoRe_specific CoRe MS51-16 MS51-17 MS51-18 MS51-19 MS51-20 MS51-07 MS51-08 MS51-09 MS51-10 HMDB KEGG.ID KEGGCompound Pathway
aconitate -32.58145 0.0000000 6426780519 -6426780518.51619 Consumed -9.31322574615479e-07 Consumed Consumed Consumed -5865088200 -6431671420 -6503096254 -6848155621 -6485891098 1186119568 -610503914 -406215498 -169400156 HMDB0000072 C00417 cis-Aconitate Citrate cycle (TCA cycle)
arginine 36.63197 0.9411031 -106493120308 106493120307.837 Released 5.7220458984375e-05 Released Released Released 216146922501 123584771692 -184582255060 137947177034 239368985373 124885985748 174141953768 -154239029796 -144788909720 HMDB0000517 C00062 L-Arginine Amino acid metabolism
aspartate -34.35422 0.0000000 21960976607 -21960976606.516 Consumed -1.72853469848633e-06 Consumed Consumed Consumed -20078854995 -23024873124 -23695204780 -22469858974 -20536091160 1216173971 2386511960 -2796694643 -805991288 HMDB0000191 C00049 L-Aspartate Amino acid metabolism
betaine 39.09360 0.0017858 -586606639559 586606639558.529 Released 1.9073486328125e-06 Released Released Released 688358341457 490864850765 657311755851 724283773821 372214475898 222640096432 -24221141664 -466988404785 268569450018 HMDB0000043 C00719 Betaine Not assigned
carbamoyl phosphate 29.63461 0.5228459 -833502731 833502730.89944 Released 9.23871994018555e-07 Released Released Released 544741981 632898115 1989818101 816820279 183235178 200618839 -354809141 549923002 -395732700 HMDB0001096 C00169 Carbamoyl phosphate Purine metabolism


Using the DMA results, we can now use the MetaProViz visualization module and generate further customized Volcano plots VizVolcano(). You can see some examples below.

ORA using the DMA results

Over Representation Analysis (ORA) is a pathway enrichment analysis (PEA) method that determines if a set of features (=metabolic pathways) are over-represented in the selection of features (=metabolites) from the data in comparison to all measured features (metabolites) using the Fishers exact test. The selection of metabolites are usually the most altered metabolites in the data, which can be selected by the top and bottom t-values. Given that for CoRe data it is important to consider weather a metabolite was consumed or released, it is sensible to perform ORA on each metabolite cluster.
Of course, there are many other PEA methods such as the well known GSEA. Here we do not aim to provide an extensive tool for different methods to perform pathway enrichment analysis and only focus on ORA since we can apply this to perform standard pathway enrichment as well as pathway enrichment on clusters of metabolites. If you are interested in using different pathway enrichment methods please check out specialized tools such as decopupleR (Badia-I-Mompel et al. 2022).

Here we will use the KEGG pathways (Kanehisa and Goto 2000). Before we can perform ORA on the DMA results, we have to ensure that the metabolite names match with the KEGG IDs or KEGG trivial names. In general, the PathwayFile requirements are column “term”, “Metabolite” and “Description”, and the Input_data requirements are column “t.val” and column “Metabolite”.

#Since we have performed multiple comparisons (all_vs_HK2), we will run ORA for each of this comparison
DM_ORA_res<- list()

comparisons <- names(DMA_Annova[["DMA"]])
for(comparison in comparisons){
  #Ensure that the Metabolite names match with KEGG IDs or KEGG trivial names.
  DMA <- DMA_Annova[["DMA"]][[comparison]]
  DMA <- DMA[complete.cases(DMA),-1]%>%#we remove metabolites that do not have a KEGG ID/KEGG pathway
    tibble::remove_rownames()%>%
    column_to_rownames("KEGGCompound")#We use the KEGG trivial names to match with the KEGG pathways

  #Perform ORA: Here we use
  DM_ORA_res[[comparison]] <- MetaProViz::ClusterORA(InputData=DMA,
                                                     SettingsInfo=c(ClusterColumn="CoRe_specific", PathwayTerm= "term", PathwayFeature= "Metabolite"),
                                                     RemoveBackground=FALSE,#we do not have any background
                                                     PathwayFile=KEGG_Pathways,
                                                     PathwayName="KEGG",
                                                     minGSSize=3,
                                                     maxGSSize=1000)
}
#> Number of metabolites in cluster `Released in 786-M1A and Consumed HK2`: 10
#> 
#> Number of metabolites in cluster `Consumed in 786-M1A  and Released HK2`: 24
#> Number of metabolites in cluster `Consumed`: 14
#> Number of metabolites in cluster `Released`: 10
#> Number of metabolites in cluster `No Change`: 1
#> Number of metabolites in cluster `Released in 786-M2A and Consumed HK2`: 10
#> Number of metabolites in cluster `Consumed in 786-M2A  and Released HK2`: 26
#> Number of metabolites in cluster `Consumed`: 14
#> Number of metabolites in cluster `Released`: 8
#> Number of metabolites in cluster `No Change`: 1
#> Number of metabolites in cluster `Released in 786-O and Consumed HK2`: 12
#> Number of metabolites in cluster `Consumed in 786-O  and Released HK2`: 25
#> Number of metabolites in cluster `Consumed`: 12
#> Number of metabolites in cluster `Released`: 9
#> Number of metabolites in cluster `No Change`: 1
#> Number of metabolites in cluster `Released in OSLM1B and Consumed HK2`: 12
#> Number of metabolites in cluster `Consumed`: 12
#> Number of metabolites in cluster `Consumed in OSLM1B  and Released HK2`: 16
#> Number of metabolites in cluster `Released`: 18
#> Number of metabolites in cluster `No Change`: 1
#> Number of metabolites in cluster `Released in OSRC2 and Consumed HK2`: 11
#> Number of metabolites in cluster `Consumed`: 13
#> Number of metabolites in cluster `Consumed in OSRC2  and Released HK2`: 25
#> Number of metabolites in cluster `Released`: 9
#> Number of metabolites in cluster `No Change`: 1
#> Number of metabolites in cluster `Released in RFX631 and Consumed HK2`: 6
#> Number of metabolites in cluster `Consumed`: 18
#> Number of metabolites in cluster `Released`: 10
#> Number of metabolites in cluster `Consumed in RFX631  and Released HK2`: 24
#> Number of metabolites in cluster `No Change`: 1

#Lets check how the results look like:
MC_ORA_786M1A_vs_HK2_Consumed <- DM_ORA_res[["786-M1A_vs_HK2"]][["DF"]][["Consumed"]]
Preview of the ORA results for the comparison of 786-M1A versus HK2 cells focusing on pathways enriched in consumed metabolites.
GeneRatio BgRatio pvalue p.adjust qvalue Metabolites_in_pathway Count Metabolites_in_Pathway Percentage_of_Pathway_detected
3/13 3/52 0.0129412 0.2394118 0.2315789 L-Aspartate/L-Histidine/Pantothenate 3 32 9.38
3/13 3/52 0.0129412 0.2394118 0.2315789 (9Z)-Octadecenoic acid/Hexadecanoic acid/Octadecanoic acid 3 58 5.17
3/13 4/52 0.0438415 0.5407123 0.5230218 (9Z)-Octadecenoic acid/Hexadecanoic acid/Octadecanoic acid 3 74 4.05
2/13 4/52 0.2573349 0.7934494 0.7674902 Glycine/5-Oxoproline 2 37 5.41
2/13 4/52 0.2573349 0.7934494 0.7674902 L-Aspartate/L-Histidine 2 47 4.26

MCA

Metabolite Clustering Analysis (MCA) is a module, which includes different functions to enable clustering of metabolites into groups based on logical regulatory rules. This can be particularly useful if one has multiple conditions and aims to find patterns in the data.

MCA_CoRe

This metabolite clustering method is based on logical regulatory rules to sort metabolites into metabolite clusters. Here you additionally need intracellular samples corresponding to the CoRe samples.
Here we will define if a feature (= metabolite) is assigned into:
1. “UP”, which means a metabolite is significantly up-regulated in the underlying comparison.
2. “DOWN”, which means a metabolite is significantly down-regulated in the underlying comparison.
3. “No Change”, which means a metabolite does not change significantly in the underlying comparison and/or is not defined as up-regulated/down-regulated based on the Log2FC threshold chosen.

Therebye “No Change” is further subdivided into four states:
1. “Not Detected”, which means a metabolite is not detected in the underlying comparison.
2. “Not Significant”, which means a metabolite is not significant in the underlying comparison.
3. “Significant positive”, which means a metabolite is significant in the underlying comparison and the differential metabolite abundance is positive, yet does not meet the threshold set for “UP” (e.g. Log2FC >1 = “UP” and we have a significant Log2FC=0.8).
4. “Significant negative”, which means a metabolite is significant in the underlying comparison and the differential metabolite abundance is negative, yet does not meet the threshold set for “DOWN”.

Lastly, we also take into account the CoRe direction, meaning if a metabolite was:
1. “Released”, which means is released into the media in both conditions of the underlying comparison.
2. “Consumed”, which means is consumed from the media in both conditions of the underlying comparison.
3. “Released/Consumed”, which means is consumed/released in one condition, whilst the opposite occurs in the second condition of the underlying comparison.
4. “Not Detected”, which means a metabolite is not detected in the underlying comparison.
This definition is done individually for each comparison and will impact in which metabolite cluster a metabolite is sorted into.
Since we have two comparisons (Intracellular and CoRe), we can choose between different Background settings, which defines which features will be considered for the clusters (e.g. you could include only features (= metabolites) that are detected in both comparisons, removing the rest of the features).The background methods backgroundMethod are the following from 1.1. - 1.4. from most restrictive to least restrictive:
1.1. Intra&CoRe: Most stringend background setting and will lead to a small number of metabolites.
1.2. CoRe: Focus is on the metabolite abundance of the CoRe.
1.3. Intra: Focus is on the metabolite abundance of intracellular.
1.4. Intra|CoRe: Least stringent background method, since a metabolite will be included in the input if it has been detected on one of the two conditions.

Lastly, we will get clusters of metabolites that are defined by the metabolite change in the two conditions. For example, if Alanine is “UP” based on the thresholds in both comparisons it will be sorted into the cluster “Core_UP”. As there are three 6-state6-state4 transitions between the comparisons, the flows are summarised into smaller amount of metabolite clusters using different Regulation Groupings (RG): 1. RG1_All
2. RG2_Significant taking into account genes that are significant (UP, DOWN, significant positive, significant negative)
3. RG3_SignificantChange only takes into account genes that have significant changes (UP, DOWN).

In order to define which group a metabolite is assigned to, we set two different thresholds. For intracellular those are based on the differential metabolite abundance (Log2FC) and the significance (e.g. p.adj). For the CoRe data this is based on the Log2 Distance and the significance (e.g. p.adj). For Log2FC we recommend a threshold of 0.5 or 1, whilst for the Log2 Distance one should check the distance ranges and base the threhold on this.

Regulatory rules:

#Example of all possible flows:
MCA_CORE <- MetaProViz::MCA_rules(Method="CoRe")
Metabolite Clustering Analysis: CoRe.
Intra CoRe Core_Direction RG1_All R2_Significant RG3_Change
DOWN DOWN Released Intra DOWN+ CoRe DOWN_Released Both_DOWN (Released) Both_DOWN (Released)
DOWN Not Detected Not Detected Intra DOWN+ CoRe Not Detected None None
DOWN Not Significant Released Intra DOWN+ CoRe Not Significant_Released None None
DOWN Significant Negative Released Intra DOWN+ CoRe Significant Negative_Released Both_DOWN (Released) None
DOWN Significant Positive Released Intra DOWN+ CoRe Significant Positive_Released Opposite (Released UP) None
DOWN UP Released Intra DOWN+ CoRe UP_Released Opposite (Released UP) Opposite (Released UP)
UP DOWN Released Intra UP+ CoRe DOWN_Released Opposite (Released DOWN) Opposite (Released DOWN)
UP Not Detected Not Detected Intra UP+ CoRe Not Detected None None
UP Not Significant Released Intra UP+ CoRe Not Significant_Released None None
UP Significant Negative Released Intra UP+ CoRe Significant Negative_Released Opposite (Released UP) None
UP Significant Positive Released Intra UP+ CoRe Significant Positive_Released Both_UP (Released) None
UP UP Released Intra UP+ CoRe UP_Released Both_UP (Released) Both_UP (Released)
Not Detected DOWN Released Intra Not Detected+ CoRe DOWN_Released CoRe_DOWN (Released) CoRe_DOWN (Released)
Not Detected Not Detected Not Detected Intra Not Detected+ CoRe Not Detected None None
Not Detected Not Significant Released Intra Not Detected+ CoRe Not Significant_Released None None
Not Detected Significant Negative Released Intra Not Detected+ CoRe Significant Negative_Released None None
Not Detected Significant Positive Released Intra Not Detected+ CoRe Significant Positive_Released None None
Not Detected UP Released Intra Not Detected+ CoRe UP_Released CoRe_UP (Released) CoRe_UP (Released)
Significant Negative DOWN Released Intra Significant Negative+ CoRe DOWN_Released Both_DOWN (Released) CoRe_DOWN (Released)
Significant Negative Not Detected Not Detected Intra Significant Negative+ CoRe Not Detected None None
Significant Negative Not Significant Released Intra Significant Negative+ CoRe Not Significant_Released None None
Significant Negative Significant Negative Released Intra Significant Negative+ CoRe Significant Negative_Released None None
Significant Negative Significant Positive Released Intra Significant Negative+ CoRe Significant Positive_Released None None
Significant Negative UP Released Intra Significant Negative+ CoRe UP_Released Opposite (Released UP) CoRe_UP (Released)
Significant Positive DOWN Released Intra Significant Positive+ CoRe DOWN_Released Opposite (Released DOWN) CoRe_DOWN (Released)
Significant Positive Not Detected Not Detected Intra Significant Positive+ CoRe Not Detected None None
Significant Positive Not Significant Released Intra Significant Positive+ CoRe Not Significant_Released None None
Significant Positive Significant Negative Released Intra Significant Positive+ CoRe Significant Negative_Released None None
Significant Positive Significant Positive Released Intra Significant Positive+ CoRe Significant Positive_Released None None
Significant Positive UP Released Intra Significant Positive+ CoRe UP_Released Both_UP (Released) CoRe_UP (Released)
Not Significant DOWN Released Intra Not Significant+ CoRe DOWN_Released CoRe_DOWN (Released) CoRe_DOWN (Released)
Not Significant Not Detected Not Detected Intra Not Significant+ CoRe Not Detected None None
Not Significant Not Significant Released Intra Not Significant+ CoRe Not Significant_Released None None
Not Significant Significant Negative Released Intra Not Significant+ CoRe Significant Negative_Released None None
Not Significant Significant Positive Released Intra Not Significant+ CoRe Significant Positive_Released None None
Not Significant UP Released Intra Not Significant+ CoRe UP_Released CoRe_UP (Released) CoRe_UP (Released)
DOWN DOWN Consumed Intra DOWN+ CoRe DOWN_Consumed Both_DOWN (Consumed) Both_DOWN (Consumed)
DOWN Not Detected Not Detected Intra DOWN+ CoRe Not Detected None None
DOWN Not Significant Consumed Intra DOWN+ CoRe Not Significant_Consumed None None
DOWN Significant Negative Consumed Intra DOWN+ CoRe Significant Negative_Consumed Both_DOWN (Consumed) None
DOWN Significant Positive Consumed Intra DOWN+ CoRe Significant Positive_Consumed Opposite (Consumed UP) None
DOWN UP Consumed Intra DOWN+ CoRe UP_Consumed Opposite (Consumed UP) Opposite (Consumed UP)
UP DOWN Consumed Intra UP+ CoRe DOWN_Consumed Opposite (Consumed DOWN) Opposite (Consumed DOWN)
UP Not Detected Not Detected Intra UP+ CoRe Not Detected None None
UP Not Significant Consumed Intra UP+ CoRe Not Significant_Consumed None None
UP Significant Negative Consumed Intra UP+ CoRe Significant Negative_Consumed Opposite (Consumed UP) None
UP Significant Positive Consumed Intra UP+ CoRe Significant Positive_Consumed Both_UP (Consumed) None
UP UP Consumed Intra UP+ CoRe UP_Consumed Both_UP (Consumed) Both_UP (Consumed)
Not Detected DOWN Consumed Intra Not Detected+ CoRe DOWN_Consumed CoRe_DOWN (Consumed) CoRe_DOWN (Consumed)
Not Detected Not Detected Not Detected Intra Not Detected+ CoRe Not Detected None None
Not Detected Not Significant Consumed Intra Not Detected+ CoRe Not Significant_Consumed None None
Not Detected Significant Negative Consumed Intra Not Detected+ CoRe Significant Negative_Consumed None None
Not Detected Significant Positive Consumed Intra Not Detected+ CoRe Significant Positive_Consumed None None
Not Detected UP Consumed Intra Not Detected+ CoRe UP_Consumed CoRe_UP (Consumed) CoRe_UP (Consumed)
Significant Negative DOWN Consumed Intra Significant Negative+ CoRe DOWN_Consumed Both_DOWN (Consumed) CoRe_DOWN (Consumed)
Significant Negative Not Detected Not Detected Intra Significant Negative+ CoRe Not Detected None None
Significant Negative Not Significant Consumed Intra Significant Negative+ CoRe Not Significant_Consumed None None
Significant Negative Significant Negative Consumed Intra Significant Negative+ CoRe Significant Negative_Consumed None None
Significant Negative Significant Positive Consumed Intra Significant Negative+ CoRe Significant Positive_Consumed None None
Significant Negative UP Consumed Intra Significant Negative+ CoRe UP_Consumed Opposite (Consumed UP) CoRe_UP (Consumed)
Significant Positive DOWN Consumed Intra Significant Positive+ CoRe DOWN_Consumed Opposite (Consumed DOWN) CoRe_DOWN (Consumed)
Significant Positive Not Detected Not Detected Intra Significant Positive+ CoRe Not Detected None None
Significant Positive Not Significant Consumed Intra Significant Positive+ CoRe Not Significant_Consumed None None
Significant Positive Significant Negative Consumed Intra Significant Positive+ CoRe Significant Negative_Consumed None None
Significant Positive Significant Positive Consumed Intra Significant Positive+ CoRe Significant Positive_Consumed None None
Significant Positive UP Consumed Intra Significant Positive+ CoRe UP_Consumed Both_UP (Consumed) CoRe_UP (Consumed)
Not Significant DOWN Consumed Intra Not Significant+ CoRe DOWN_Consumed CoRe_DOWN (Consumed) CoRe_DOWN (Consumed)
Not Significant Not Detected Not Detected Intra Not Significant+ CoRe Not Detected None None
Not Significant Not Significant Consumed Intra Not Significant+ CoRe Not Significant_Consumed None None
Not Significant Significant Negative Consumed Intra Not Significant+ CoRe Significant Negative_Consumed None None
Not Significant Significant Positive Consumed Intra Not Significant+ CoRe Significant Positive_Consumed None None
Not Significant UP Consumed Intra Not Significant+ CoRe UP_Consumed CoRe_UP (Consumed) CoRe_UP (Consumed)
DOWN DOWN Released/Consumed Intra DOWN + CoRe DOWN_Released/Consumed Both_DOWN (Released/Consumed) Both_DOWN (Released/Consumed)
DOWN Not Detected Not Detected Intra DOWN + CoRe Not Detected None None
DOWN Not Significant Released/Consumed Intra DOWN + CoRe Not Significant_Released/Consumed None None
DOWN Significant Negative Released/Consumed Intra DOWN + CoRe Significant Negative_Released/Consumed Both_DOWN (Released/Consumed) None
DOWN Significant Positive Released/Consumed Intra DOWN + CoRe Significant Positive_Released/Consumed Opposite (Released/Consumed UP) None
DOWN UP Released/Consumed Intra DOWN + CoRe UP_Released/Consumed Opposite (Released/Consumed UP) Opposite (Released/Consumed UP)
UP DOWN Released/Consumed Intra UP + CoRe DOWN_Released/Consumed Opposite (Released/Consumed DOWN) Opposite (Released/Consumed DOWN)
UP Not Detected Not Detected Intra UP + CoRe Not Detected None None
UP Not Significant Released/Consumed Intra UP + CoRe Not Significant_Released/Consumed None None
UP Significant Negative Released/Consumed Intra UP + CoRe Significant Negative_Released/Consumed Opposite (Released/Consumed UP) None
UP Significant Positive Released/Consumed Intra UP + CoRe Significant Positive_Released/Consumed Both_UP (Released/Consumed) None
UP UP Released/Consumed Intra UP + CoRe UP_Released/Consumed Both_UP (Released/Consumed) Both_UP (Released/Consumed)
Not Detected DOWN Released/Consumed Intra Not Detected + CoRe DOWN_Released/Consumed CoRe_DOWN (Released/Consumed) CoRe_DOWN (Released/Consumed)
Not Detected Not Detected Not Detected Intra Not Detected + CoRe Not Detected None None
Not Detected Not Significant Released/Consumed Intra Not Detected + CoRe Not Significant_Released/Consumed None None
Not Detected Significant Negative Released/Consumed Intra Not Detected + CoRe Significant Negative_Released/Consumed None None
Not Detected Significant Positive Released/Consumed Intra Not Detected + CoRe Significant Positive_Released/Consumed None None
Not Detected UP Released/Consumed Intra Not Detected + CoRe UP_Released/Consumed CoRe_UP (Released/Consumed) CoRe_UP (Released/Consumed)
Significant Negative DOWN Released/Consumed Intra Significant Negative + CoRe DOWN_Released/Consumed Both_DOWN (Released/Consumed) CoRe_DOWN (Released/Consumed)
Significant Negative Not Detected Not Detected Intra Significant Negative + CoRe Not Detected None None
Significant Negative Not Significant Released/Consumed Intra Significant Negative + CoRe Not Significant_Released/Consumed None None
Significant Negative Significant Negative Released/Consumed Intra Significant Negative + CoRe Significant Negative_Released/Consumed None None
Significant Negative Significant Positive Released/Consumed Intra Significant Negative + CoRe Significant Positive_Released/Consumed None None
Significant Negative UP Released/Consumed Intra Significant Negative + CoRe UP_Released/Consumed Opposite (Released/Consumed UP) CoRe_UP (Released/Consumed)
Significant Positive DOWN Released/Consumed Intra Significant Positive + CoRe DOWN_Released/Consumed Opposite (Released/Consumed DOWN) CoRe_DOWN (Released/Consumed)
Significant Positive Not Detected Not Detected Intra Significant Positive + CoRe Not Detected None None
Significant Positive Not Significant Released/Consumed Intra Significant Positive + CoRe Not Significant_Released/Consumed None None
Significant Positive Significant Negative Released/Consumed Intra Significant Positive + CoRe Significant Negative_Released/Consumed None None
Significant Positive Significant Positive Released/Consumed Intra Significant Positive + CoRe Significant Positive_Released/Consumed None None
Significant Positive UP Released/Consumed Intra Significant Positive + CoRe UP_Released/Consumed Both_UP (Released/Consumed) CoRe_UP (Released/Consumed)
Not Significant DOWN Released/Consumed Intra Not Significant + CoRe DOWN_Released/Consumed CoRe_DOWN (Released/Consumed) CoRe_DOWN (Released/Consumed)
Not Significant Not Detected Not Detected Intra Not Significant + CoRe Not Detected None None
Not Significant Not Significant Released/Consumed Intra Not Significant + CoRe Not Significant_Released/Consumed None None
Not Significant Significant Negative Released/Consumed Intra Not Significant + CoRe Significant Negative_Released/Consumed None None
Not Significant Significant Positive Released/Consumed Intra Not Significant + CoRe Significant Positive_Released/Consumed None None
Not Significant UP Released/Consumed Intra Not Significant + CoRe UP_Released/Consumed CoRe_UP (Released/Consumed) CoRe_UP (Released/Consumed)


Now we can load the corresponding pre-processed intracellular example data for the comparison of 786M-1A versus HK2 (For the detailed pre-processing please see the vignette “Standard Metabolomics”).

#Load the Pre-processed intracellular data:
Intra_DMA_786M1A_vs_HK2<- MetaProViz::ToyData(Data="IntraCells_DMA")

#Perform metabolite clustering:
MCA_CoRe_res <- MetaProViz::MCA_CoRe(InputData_Intra =Intra_DMA_786M1A_vs_HK2%>%rownames_to_column("Metabolite") ,
                                     InputData_CoRe = DMA_786M1A_vs_HK2,
                                     SettingsInfo_Intra=c(ValueCol="Log2FC",StatCol="p.adj", StatCutoff= 0.05, ValueCutoff=1),
                                     SettingsInfo_CoRe=c(DirectionCol="CoRe", ValueCol="Log2(Distance)",StatCol="p.adj", StatCutoff= 0.05, ValueCutoff=28),
                                     FeatureID= "Metabolite",
                                     BackgroundMethod="Intra&CoRe",
                                     FolderPath=NULL)

#Lets check how the results look like:
MCA_res <- MCA_CoRe_res[["MCA_CoRe_Results"]]
ClusterSummary <- MCA_CoRe_res[["MCA_CoRe_Summary"]]
MetaProViz::MCA_CoRe for the comparison of 786-M1A versus HK2 cells in intracellular and CoRe samples.
Metabolite Intra_DF_Cutoff Intra_DF_Cutoff_Specific.x CoRe_DF_Detected CoRe_DF_Cutoff CoRe_DF_Cutoff_Specific BG_Method RG1_All RG2_Significant RG3_Change
adenosine No Change Not Significant FALSE No Change Not Detected FALSE Background = FALSE Background = FALSE Background = FALSE
ADP No Change Not Significant FALSE No Change Not Detected FALSE Background = FALSE Background = FALSE Background = FALSE
betaine No Change Significant Negative TRUE UP UP TRUE Intra Significant Negative + CoRe UP_Released Opposite (Released UP) CoRe_UP (Released)
creatine No Change Significant Positive TRUE UP UP TRUE Intra Significant Positive + CoRe UP_Released/Consumed Both_UP (Released/Consumed) CoRe_UP (Released/Consumed)
MetaProViz::MCA_CoRe Summary of number of metabolites per cluster.
Regulation Grouping SiRCle cluster Name Number of Features
RG2_Significant Both_UP (Released/Consumed) 2
RG2_Significant Both_DOWN (Released/Consumed) 10
RG2_Significant Opposite (Consumed DOWN) 2
RG2_Significant Opposite (Released UP) 2
RG2_Significant Both_DOWN (Consumed) 2
RG2_Significant CoRe_DOWN (Released/Consumed) 4
RG2_Significant CoRe_UP (Released/Consumed) 2
RG2_Significant CoRe_UP (Released) 1
RG2_Significant CoRe_DOWN (Consumed) 1
RG3_Change Background = FALSE 151
RG3_Change None 22
RG3_Change Both_UP (Released/Consumed) 1
RG3_Change CoRe_DOWN (Released/Consumed) 8
RG3_Change CoRe_DOWN (Consumed) 5
RG3_Change Both_DOWN (Released/Consumed) 6
RG3_Change Opposite (Released UP) 1
RG3_Change CoRe_UP (Released) 2
RG3_Change CoRe_UP (Released/Consumed) 3

ORA on each metabolite cluster

As explained in detail above, Over Representation Analysis (ORA) is a pathway enrichment analysis (PEA) method. As ORA is based on the Fishers exact test it is perfect to test if a set of features (=metabolic pathways) are over-represented in the selection of features (= clusters of metabolites) from the data in comparison to all measured features (all metabolites). In detail, MC_ORA() will perform ORA on each of the metabolite clusters using all metabolites as the background.
Pathway Input for MetaProViz::MC_ORA.
HMDB KEGG.ID KEGGCompound Pathway
N-acetylaspartate HMDB0000812 C01042 N-Acetyl-L-aspartate Alanine, aspartate and glutamate metabolism
argininosuccinate HMDB0000052 C03406 N-(L-Arginino)succinate Alanine, aspartate and glutamate metabolism
N-acetylaspartylglutamate HMDB0001067 C12270 N-Acetylaspartylglutamate Alanine, aspartate and glutamate metabolism
tyrosine HMDB0000158 C00082 L-Tyrosine Amino acid metabolism
asparagine HMDB0000168 C00152 L-Asparagine Amino acid metabolism
glutamate HMDB0000148 C00025 L-Glutamate Amino acid metabolism
MC_ORA_result<- MetaProViz::ClusterORA(InputData=MCA_CoRe_res[["MCA_CoRe_Results"]]%>%column_to_rownames("Metabolite"),
                                       SettingsInfo=c(ClusterColumn="RG2_Significant",
                                                        BackgroundColumn="BG_Method",
                                                        PathwayTerm= "Pathway", #This is the column name including the pathways names
                                                        PathwayFeature= "Metabolite"),
                                       RemoveBackground=TRUE,
                                       PathwayFile=MappingInfo%>%rownames_to_column("Metabolite"),
                                       PathwayName="KEGG",
                                       minGSSize=3,
                                       maxGSSize=1000 ,
                                       SaveAs_Table= "csv")
#> Number of metabolites in cluster `None`: 22
#> Number of metabolites in cluster `Both_UP (Released/Consumed)`: 2
#> Number of metabolites in cluster `Both_DOWN (Released/Consumed)`: 10
#> Number of metabolites in cluster `Opposite (Consumed DOWN)`: 2
#> Number of metabolites in cluster `Opposite (Released UP)`: 2
#> Number of metabolites in cluster `Both_DOWN (Consumed)`: 2
#> Number of metabolites in cluster `CoRe_DOWN (Released/Consumed)`: 4
#> Number of metabolites in cluster `CoRe_UP (Released/Consumed)`: 2
#> Number of metabolites in cluster `CoRe_UP (Released)`: 1
#> Number of metabolites in cluster `CoRe_DOWN (Consumed)`: 1

#Lets check how the results look like:
Both_UP_Released <- MC_ORA_result[["DF"]][["Both_UP (Released)"]]
MetaProViz::MC_ORA results for the RG2_Significant cluster Both_UP (Released).


Here we see that the pathways have a low amount of genes included that were also part of the cluster and the pathways are not significant. This is due to multiple factors, first we only start with a small number of metabolites with KEGG IDs and secondly we only included metabolites if they where detected in both, intracellular and CoRe samples (parameter backgroundMethod="Intra&CoRe"). Hence, by for example setting parameter backgroundMethod="Intra|CoRe", we will obtain larger metabolite clusters.

3. Run MetaProViz Visualisation

The big advantages of the MetaProViz visualization module is its flexible and easy usage, which we will showcase below and that the figures are saved in a publication ready style and format. For instance, the x- and y-axis size will always be adjusted for the amount of samples or features (=metabolites) plotted, or in the case of Volcano plot and PCA plot the axis size is fixed and not affected by figure legends or title. In this way, there is no need for many adjustments and the figures can just be dropped into the presentation or paper and are all in the same style.

All the VizPlotName() functions are constructed in the same way. Indeed, with the parameter Plot_SettingsInfo the user can pass a named vector with information about the metadata column that should be used to customize the plot by colour, shape or creating individual plots, which will all be showcased for the different plot types. Via the parameter Plot_SettingsFile the user can pass the metadata DF, which can be dependent on the plot type for the samples and/or the features (=metabolites). In case of both the parameter is named Plot_SettingsFile_Sample and Plot_SettingsFile_Metab.

In each of those Plot_Settings, the user can label color and/or shape based on additional information (e.g. Pathway information, Cluster information or other other demographics like gender). Moreover, we also enable to plot individual plots where applicable based on those MetaData (e.g. one plot for each metabolic pathway).
For this we need a metadata table including information about our samples that could be relevant to e.g. color code:

MetaData_Sample <- Media_Preprocessed[,c(1:2)]%>%
   mutate(Status = case_when(Conditions=="HK2" ~ 'Healthy',
                               TRUE ~ 'Cancer'))
Metadata table including additional information about our Samples.
Conditions Biological_Replicates Status
MS51-07 HK2 2 Healthy
MS51-08 HK2 3 Healthy
MS51-09 HK2 4 Healthy
MS51-10 HK2 5 Healthy
MS51-11 786-O 1 Cancer
MS51-12 786-O 2 Cancer
MS51-13 786-O 3 Cancer
MS51-14 786-O 4 Cancer
MS51-15 786-O 5 Cancer
MS51-16 786-M1A 1 Cancer
MS51-17 786-M1A 2 Cancer
MS51-18 786-M1A 3 Cancer
MS51-19 786-M1A 4 Cancer
MS51-20 786-M1A 5 Cancer
MS51-21 786-M2A 1 Cancer
MS51-23 786-M2A 2 Cancer
MS51-24 786-M2A 3 Cancer
MS51-25 786-M2A 4 Cancer
MS51-26 OSRC2 1 Cancer
MS51-27 OSRC2 2 Cancer
MS51-28 OSRC2 3 Cancer
MS51-29 OSRC2 4 Cancer
MS51-30 OSRC2 5 Cancer
MS51-31 OSLM1B 1 Cancer
MS51-32 OSLM1B 2 Cancer
MS51-33 OSLM1B 3 Cancer
MS51-34 OSLM1B 4 Cancer
MS51-35 OSLM1B 5 Cancer
MS51-36 RFX631 1 Cancer
MS51-37 RFX631 2 Cancer
MS51-38 RFX631 3 Cancer
MS51-39 RFX631 4 Cancer
MS51-40 RFX631 5 Cancer


Moreover, we can use MetaData for our features (=Metabolites), which we loaded with the MappingInfo and we can also add the information on which cluster a metabolite was assigned to in the MetaProViz::MCA() analysis above:

MetaData_Metab <-MappingInfo
Metadata table including additional information about the Metabolites.
HMDB KEGG.ID KEGGCompound Pathway
HMDB0001067 C12270 N-Acetylaspartylglutamate Alanine, aspartate and glutamate metabolism
HMDB0000158 C00082 L-Tyrosine Amino acid metabolism
HMDB0000148 C00025 L-Glutamate Amino acid metabolism
HMDB0250980 NA NA Amino acid metabolism
NA NA NA Amino acid metabolism
HMDB0004207 NA NA Amino acid metabolism


Noteworthy, here we can also use the KEGG pathways we used for the pathway analysis.

PCA plots

Principal component analysis (PCA) is a dimensionality reduction method that reduces all the measured features (=metabolites) of one sample into a few features in the different principal components, whereby each principal component can explain a certain percentage of the variance between the different samples. Hence, this enables interpretation of sample clustering based on the measured features (=metabolites).
As mentioned above, PCA plots can be quite useful for quality control, but of course it offers us many more opportunities, which will be showcased here.

As input, we need a DF that contains the samples as rownames and the features (=metabolites) as column names:

Input_PCA <- Media_Preprocessed[,-c(1:4)] #remove columns that include Metadata such as cell type,...
Input_data for MetaProViz::VizPCA(), with samples as rownames and metabolites as column names.
valine-d8 hipppuric acid-d5 2-hydroxyglutarate 2-ketoglutarate 3-Dehydro-L-threonate acetylcarnitine acetylcholine
MS51-07 5049281744 30218506574 -868601575 1530912430 28673393701 -67547592848 -496576017
MS51-08 -1441406385 -11561761745 1108346025 3684365990 39313101752 -234677799645 -124297109
MS51-09 2563904070 -10237061776 -189633550 -9097094735 -67782920598 218513442831 526295913
MS51-10 -6171779428 -8419683053 -50110899 3881816314 -203574855 83711949661 94577213
MS51-11 27378617230 123649127347 4311754839 23286535748 -16721754062 -434945308541 -183109349
MS51-12 31998154622 99965859879 808357705 19381509727 -16349278172 -583665678018 -998475065


Now lets check out the standard plot:

MetaProViz::VizPCA(InputData=Input_PCA)
Figure: Standard Settings.

Figure: Standard Settings.

Next, we can interactively choose shape and color using the additional information of interest from our Metadata. Especially for complex data, such as patient data, it can be valuable to use different demographics (e.g. age, gender, medication,…) for this. First lets check if we have any batch effect by colour coding for the biological replicates, which would be the case if the replicates cluster together.

MetaProViz::VizPCA(SettingsInfo= c(color="Biological_Replicates"),
                   SettingsFile_Sample = MetaData_Sample ,
                   InputData=Input_PCA,
                   PlotName = "Batch Effect")
Figure: Do we have a batch effect?

Figure: Do we have a batch effect?

Given the biological replicates are numeric, we can also set color_scale to continuous:

MetaProViz::VizPCA(SettingsInfo= c(color="Biological_Replicates"),
                   SettingsFile_Sample = MetaData_Sample ,
                   InputData=Input_PCA,
                   ColorScale = "continuous",
                   PlotName = "Batch Effect (continuous color scale)")
Figure: Do we have a batch effect?

Figure: Do we have a batch effect?

Next, we can colour code for condition and use the biological replicates in the shape parameter:

MetaProViz::VizPCA(SettingsInfo= c(color="Conditions", shape="Biological_Replicates"),
                   SettingsFile_Sample = MetaData_Sample ,
                   InputData=Input_PCA,
                   PlotName = "Sample Conditions")
Figure: Do the samples cluster for the conditions?

Figure: Do the samples cluster for the conditions?

The different cell lines we have are either control or cancerous, so we can display this too.

MetaProViz::VizPCA(SettingsInfo=  c(color="Status"),
                   SettingsFile_Sample = MetaData_Sample ,
                   InputData=Input_PCA,
                   PlotName = "Sample Status")
Figure: Do the samples cluster for the Cell status?

Figure: Do the samples cluster for the Cell status?

Heatmaps

Clustered heatmaps can be useful to understand the patterns in the data, which will be showcased on different examples.
As input, we need a DF that contains the samples as rownames and the features (=metabolites) as column names:

Input_Heatmap <-   Media_Preprocessed[,-c(1:4)] #remove columns that include Metadata such as cell type,...
Input for MetaProViz::VizHeatmap(), with samples as rownames and metabolites as column names.
valine-d8 hipppuric acid-d5 2-hydroxyglutarate 2-ketoglutarate 3-Dehydro-L-threonate
MS51-07 5049281744 30218506574 -868601575 1530912430 28673393701
MS51-08 -1441406385 -11561761745 1108346025 3684365990 39313101752
MS51-09 2563904070 -10237061776 -189633550 -9097094735 -67782920598
MS51-10 -6171779428 -8419683053 -50110899 3881816314 -203574855
MS51-11 27378617230 123649127347 4311754839 23286535748 -16721754062
MS51-12 31998154622 99965859879 808357705 19381509727 -16349278172


Now we can generate an overview heatmap. Since we plot all metabolites the metabolite names are not plotted since this would get too crowded (You can enforce this by changing the parameter enforce_FeatureNames = TRUE).

MetaProViz::VizHeatmap(InputData = Input_Heatmap,
                       PlotName = "Overview")
Overview heatmap.

Overview heatmap.


Here we can add as many sample metadata information as needed at the same time:

MetaProViz::VizHeatmap(InputData = Input_Heatmap,
                       SettingsFile_Sample = MetaData_Sample,
                       SettingsInfo = c(color_Sample = list("Conditions","Biological_Replicates", "Status")),
                       PlotName = "Colour Samples")
Colour for sample metadata.

Colour for sample metadata.


Moreover, we can also add metabolite metadata information:

MetaProViz::VizHeatmap(InputData = Input_Heatmap,
                       SettingsFile_Sample = MetaData_Sample,
                       SettingsInfo = c(color_Metab = list("Pathway")),
                       SettingsFile_Metab =  MappingInfo,
                       PlotName = "Colour Metabolites")
Colour for metabolite metadata.

Colour for metabolite metadata.


Lastly, by generate individual plot for e.g. each pathway or the metabolite clusters by adding individual (individual_Sample or individual_Metab) to Plot_SettingsInfo. At the same time we can still maintain the metadata information for both, the samples and the metabolites. Together this can help us to draw biological conclusions about the different pathways: Indeed, we can observe for the D-Amino acid metabolism many metabolites fall into the MCA-Cluster Core_DOWN, meaning in comparison to HK2 cells we have a negative Log2FC for 786-O and 786-M1A.

# individual: One individual plot for each pathway, col annotation: Colour for samples
MetaProViz::VizHeatmap(InputData = Input_Heatmap,
                       SettingsFile_Sample = MetaData_Sample,
                       SettingsInfo = c(individual_Metab = "Pathway",
                                        color_Sample = list("Conditions","Biological_Replicates"),
                                        color_Metab = list("Pathway")),
                       SettingsFile_Metab =  MetaData_Metab,
                       PlotName = "Pathway")




You can also choose to make individual plots for any Sample Metadata using individual_Sample (e.g. in patients you may want to plot male and female separately). Moreover, you can also use both at the same time.

Superplots

Sometimes one might be interested to create individual plots for each metabolite to understand the differences between specific conditions. For this common plot types are bargraphs, boxplots or violin plots. As input, we need a DF that contains the samples as rownames and the features (=metabolites) as column names:

Input_Superplot <-  Media_Preprocessed[,-c(1:4)]#remove columns that include Metadata such as cell type,...
Input for MetaProViz::VizSuperplot(), with samples as rownames and metabolites as column names.
valine-d8 hipppuric acid-d5 2-hydroxyglutarate 2-ketoglutarate 3-Dehydro-L-threonate acetylcarnitine acetylcholine
MS51-07 5049281744 30218506574 -868601575 1530912430 28673393701 -67547592848 -496576017
MS51-08 -1441406385 -11561761745 1108346025 3684365990 39313101752 -234677799645 -124297109
MS51-09 2563904070 -10237061776 -189633550 -9097094735 -67782920598 218513442831 526295913
MS51-10 -6171779428 -8419683053 -50110899 3881816314 -203574855 83711949661 94577213
MS51-11 27378617230 123649127347 4311754839 23286535748 -16721754062 -434945308541 -183109349
MS51-12 31998154622 99965859879 808357705 19381509727 -16349278172 -583665678018 -998475065
MS51-13 26220971337 99972874145 -235593734 19536285942 -28131696536 -645802166950 -822951440
MS51-14 27882417489 115689376712 1115432087 17101698081 -26309877624 -275257639669 -1272434260
MS51-15 24806237271 132343157722 -1179339178 17846663766 -40351828895 -321374884275 -1318142384
MS51-16 11094140749 57354194293 5445671697 28056643231 9905712215 -356638276041 -929117918
MS51-17 34144526544 116670253290 3041583921 22341957663 -1871784883 -379917701289 -1217951807
MS51-18 30843226841 131915991104 2364979960 19876068811 -12106631836 -297445077871 -907725948
MS51-19 26032777849 93842230196 600387750 22338213828 -15996801077 -325213646459 -1216114751
MS51-20 15413411654 47939629182 4007911817 20555474456 13695406411 -438100232873 -1430510133
MS51-21 53001553101 127450060294 4809729749 39166236259 -38896232803 -125539074625 -943874817
MS51-23 47001796040 161667174110 6707051392 38517108736 -6979130432 -673387467221 -664880164
MS51-24 8217942926 37130736106 5857701472 39276732227 -10740427138 -333936377093 -1206364609
MS51-25 7242347148 57539536462 4568079831 40583987397 -18377163802 -282195052525 -1314339238
MS51-26 1631567949 36681096085 1121378276 -2694080144 -60644185117 -198965081888 -843128099
MS51-27 1209476862 -8930444074 3405336890 -2060735139 -51783244894 -209186167980 -365874316
MS51-28 -4518925715 -2435870916 7007395646 -2525310574 -30952324005 -408114545906 -609942826
MS51-29 1484077336 -2058725307 3515115140 -5164411612 -42307838881 -287356849930 -912778912
MS51-30 4368619286 15502685767 -1093856576 -4703700433 -60704125766 -260908005179 -383882500
MS51-31 7907484382 12534261776 -2452294215 -8563910085 -75125518110 -289811675340 -918369787
MS51-32 10026207227 8766901671 1953315759 -7520737233 -55168263224 -890597864901 -314644061
MS51-33 1320088121 -19794580878 -541809575 -6750277825 -52231898984 -770337735939 -182991027
MS51-34 4019534162 -21662185450 1428924068 -7578531520 -47605477000 -741976996697 -550044215
MS51-35 -11674356179 -75718957578 972005613 -6094793780 -57165852971 -822161480737 -503488121
MS51-36 9892941550 46138316334 4954019648 5560697649 -23867649475 -71475573825 -1119704368
MS51-37 14414593887 78736973172 -257560661 -2341634721 -51914744178 -35405004462 -687055540
MS51-38 7716702900 3492507671 -2307137058 -4331637640 -67900220172 230840109765 -408157136
MS51-39 -583337199 -9483365742 706088109 -2757559248 -56366820622 -263196399957 -227087862
MS51-40 11097184979 24818171463 3186690665 -4462779576 -50563367720 145901488878 -319389629

We also need the Metadata as we will need to know which conditions to plot for together. If you have further information such as replicates or patient ID, we can use this for the colour of the plotted samples per condition as in the superplots style as described in by Lord et al (Lord et al. 2020).

MetaProViz:::VizSuperplot(InputData =Input_Superplot[,c(1:6)],#We just plot six metabolites
                                           SettingsFile_Sample =MetaData_Sample,
                                           SettingsInfo = c(Conditions="Conditions", Superplot = "Biological_Replicates"),
                                           PlotType = "Bar", #Bar, Box, Violin
                                           PlotConditions = c("HK2", "786-O", "786-M1A", "786-M2A", "OSRC2", "OSLM1B", "RFX631"),#sets the order in which the samples should be plotted
                                           StatComparisons = list(c(1,2),c(1,4)))#Stat comparisons to be included on the plot




Now, if we for instance prefer boxplots over bargraphs we can simply change the parameter PlotType:

MetaProViz:::VizSuperplot(InputData =Input_Superplot[,c(1:6)],#We just plot six metabolites
                                           SettingsFile_Sample =MetaData_Sample,
                                           SettingsInfo = c(Conditions="Conditions", Superplot = "Biological_Replicates"),
                                           PlotType = "Box", #Bar, Box, Violin
                                           PlotConditions = c("HK2", "786-O", "786-M1A", "786-M2A", "OSRC2", "OSLM1B", "RFX631"),#sets the order in which the samples should be plotted
                                           StatComparisons = list(c(1,2),c(1,4)))#Stat comparisons to be included on the plot




We can also change it to violin plots:

MetaProViz:::VizSuperplot(InputData =Input_Superplot[,c(1:6)],#We just plot six metabolites
                                           SettingsFile_Sample =MetaData_Sample,
                                           SettingsInfo = c(Conditions="Conditions", Superplot = "Biological_Replicates"),
                                           PlotType = "Violin", #Bar, Box, Violin
                                           PlotConditions = c("HK2", "786-O", "786-M1A", "786-M2A", "OSRC2", "OSLM1B", "RFX631"),#sets the order in which the samples should be plotted
                                           StatComparisons = list(c(1,2),c(1,4)))#Stat comparisons to be included on the plot




Volcano plot

In general,we have three different Plot_Settings, which will also be used for other plot types such as lollipop graphs.
1. "Standard" is the standard version of the plot, with one dataset being plotted.
2. "Conditions" here two or more datasets will be plotted together. How datasets can be plotted together depends on the plot type.
3. "PEA" stands for Pathway Enrichment Analysis, and is used if the results of an GSE analysis should be plotted as here the figure legends will be adapted. You can find an example for this in the vignette Standard Metabolomics

Here we will look at all the different options we have to display our results from the different analysis, which will help us to interpret our results as this can be sometimes difficult to do from the many data tables.
Just a quick reminder, how the input data look like:
1. Results of Differential Metabolite Analysis (DMA): Log2(Distance) and stats:
Input_data for MetaProViz::VizVolcano() are for example differential analysis results from MetaProViz::DMA().
Metabolite Log2(Distance) p.adj t.val Mean_786-M1A CoRe_786-M1A
2-hydroxyglutarate 31.52594 0.3447870 -3092107029 3092107028.86536 Released
2-ketoglutarate 34.39775 0.0000000 -22633671598 22633671597.7763 Released
3-Dehydro-L-threonate -30.24765 0.9999999 1274819834 -1274819834.18323 Consumed
acetylcarnitine -38.38705 0.0653736 359462986907 -359462986906.515 Consumed
acetylcholine -30.08675 0.0004996 1140284111 -1140284111.31361 Consumed
acetylornithine 32.65274 0.0012793 -6752323351 6752323351.13854 Released
aconitate -32.58145 0.0000000 6426780519 -6426780518.51619 Consumed
Standard

Here we will first look into the results from the differential analysis (see section DMA above) for the comparison of 786-M1A_vs_HK2:

# Run with default parameter --> only need to provide Input_data and the title we like
MetaProViz::VizVolcano(InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)")
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


If you seek to plot the metabolite names you can change the paramter SelectLab from its default (SelectLab="") to NULL and the metabolite names will be plotted randomly.

# Run with default parameter --> only need to provide Input_data and the title we like
MetaProViz::VizVolcano(InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       SelectLab = NULL)
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


With the parameter SelectLab you can also pass a vector with Metabolite names that should be labeled:

# Run with default parameter --> only need to provide Input_data and the title we like
MetaProViz::VizVolcano(InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       SelectLab = c("histidine", "phenylalanine", "lactate"))
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


As explained above, when analyzing CoRe data it is important to take into account if a metabolite is consumed or released. we can use this information to colour code and or shape the metabolites on the plot.
For this we need to add this information into the Metadata_Metabolite file:

# colour for consumption and release: For this we need to add this information into the Metadata_Metabolite file
MetaData_Metab <- merge(MappingInfo%>%rownames_to_column("Metabolite"), DMA_786M1A_vs_HK2[,c(1,6,8:10)], by="Metabolite", all.y=TRUE)%>%
  column_to_rownames("Metabolite")
Metadata table including additional information about the Metabolites.
HMDB KEGG.ID KEGGCompound Pathway CoRe_786-M1A CoRe_HK2 CoRe_specific CoRe
3-Dehydro-L-threonate NA NA NA Amino acid metabolism Consumed Consumed Consumed Consumed
acetylcarnitine HMDB0000201 C02571 O-Acetylcarnitine Fatty acyl carnitines Consumed Released Consumed in 786-M1A and Released HK2 Released/Consumed
acetylornithine HMDB0003357 C00437 N-Acetylornithine Arginine and proline metabolism Released Consumed Released in 786-M1A and Consumed HK2 Released/Consumed
glutamate HMDB0000148 C00025 L-Glutamate Amino acid metabolism Released Consumed Released in 786-M1A and Consumed HK2 Released/Consumed
glutamine HMDB0000641 C00064 L-Glutamine Amino acid metabolism Released Consumed Released in 786-M1A and Consumed HK2 Released/Consumed
glycerylphosphorylcholine HMDB0252858 NA NA Not assigned Consumed Consumed Consumed Consumed


Now we can make the different plots:

#Now we need to add our Plot_SettingsFile and the Plot_SettingsInfo:
MetaProViz::VizVolcano(PlotSettings="Standard",
                       SettingsInfo= c(color="CoRe_specific"),
                       SettingsFile_Metab= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A versus HK2",
                       Subtitle= "Results of DMA. Colour coded for consumption/release" )
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


#If we want to use the shape instead of the colour for the cluster info, we can just change our Plot_SettingsInfo
MetaProViz::VizVolcano(PlotSettings="Standard",
                       SettingsInfo= c(shape="CoRe_specific"),
                       SettingsFile= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A versus HK2",
                       Subtitle= "Results of DMA. Shape for consumption/release, color for significance." )
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


#Of course, we can also adapt both, color and shape for the same parameter:
MetaProViz::VizVolcano(PlotSettings="Standard",
                       SettingsInfo= c(shape="CoRe_specific", color="CoRe_specific"),
                       SettingsFile= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A versus HK2",
                       Subtitle= "Results of DMA. Shape and color for consumption/release." )
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


Of course, here we may also want an individual plot for each of the consumption/release metabolites.

#individual plot for each metabolite behaviour:
MetaProViz::VizVolcano(PlotSettings="Standard",
                       SettingsInfo= c(individual="CoRe", shape="CoRe_specific"),
                       SettingsFile= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A versus HK2",
                       Subtitle= "Results of DMA." )
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.

#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
#> Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.

#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
#> Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
Figure: Standard figure displaying DMA results.

Figure: Standard figure displaying DMA results.


Given that we also know, which metabolic pathway the metabolites correspond to, we can add this information into the plot. This is also a good example to showcase the flexibility of the visualisation function: Either you use the parameter Plot_SettingsFile= MetaData_Metab as above, but as we have the column “Pathway” also in our Input_data you can also pass Plot_SettingsFile= DMA_786-M1A_vs_HK2 or simply use the default Plot_SettingsFile=NULL, in which case the Plot_SettingsInfo information (here color) will be used from Input_data.

#Now we can use color for the pathways and shape for the metabolite clusters:
MetaProViz::VizVolcano(PlotSettings="Standard",
                       SettingsInfo= c(individual="CoRe", shape="CoRe_specific", color="Pathway"),
                       SettingsFile= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A versus HK2",
                       Subtitle= "Results of DMA." )
Figure: Standard figure displaying DMA results colour coded for metabolic pathways and shaped for metabolic clusters.

Figure: Standard figure displaying DMA results colour coded for metabolic pathways and shaped for metabolic clusters.

#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
#> Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
Figure: Standard figure displaying DMA results colour coded for metabolic pathways and shaped for metabolic clusters.

Figure: Standard figure displaying DMA results colour coded for metabolic pathways and shaped for metabolic clusters.

#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
#> Removed 2 rows containing missing values or values outside the scale range
#> (`geom_vline()`).
Figure: Standard figure displaying DMA results colour coded for metabolic pathways and shaped for metabolic clusters.

Figure: Standard figure displaying DMA results colour coded for metabolic pathways and shaped for metabolic clusters.

Comparison

The parameter Plot_Settings="Compare" is helpful if you have performed multiple comparisons and seek to compare two of them in one plot:

#Make the plot
MetaProViz::VizVolcano(PlotSettings="Compare",
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       InputData2= DMA_Annova[["DMA"]][["786-O_vs_HK2"]]%>%column_to_rownames("Metabolite"),
                       ComparisonName= c(InputData="786M1A_vs_HK", InputData2= "786-O_vs_HK2"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A vs HK2 compared to 7860 vs HK2",
                       Subtitle= "Results of DMA" )
Figure: Comparison.

Figure: Comparison.


Of course you have option to use shape or color to further customize your graph as well as make individual plots:

#Make the plot
MetaProViz::VizVolcano(PlotSettings="Compare",
                       SettingsInfo= c(color="Pathway"),
                       SettingsFile_Metab= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       InputData2= DMA_Annova[["DMA"]][["786-O_vs_HK2"]]%>%column_to_rownames("Metabolite"),
                       ComparisonName= c(InputData="786M1A_vs_HK", InputData2= "786-O_vs_HK2"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A vs HK2 compared to 7860 vs HK2",
                       Subtitle= "Results of DMA" )
Figure: Comparison.

Figure: Comparison.

Now we do individual plots again:

MetaProViz::VizVolcano(PlotSettings="Compare",
                       SettingsInfo= c(individual="Pathway"),
                       SettingsFile_Metab= MetaData_Metab,
                       InputData=DMA_786M1A_vs_HK2%>%column_to_rownames("Metabolite"),
                       InputData2= DMA_Annova[["DMA"]][["786-O_vs_HK2"]]%>%column_to_rownames("Metabolite"),
                       ComparisonName= c(InputData="786M1A_vs_HK", InputData2= "786-O_vs_HK2"),
                       x= "Log2(Distance)",
                       PlotName= "786M1A vs HK2 compared to 7860 vs HK2",
                       Subtitle= "Results of DMA" )



PathwayEnrichmentAnalysis

If you have performed Pathway Enrichment Analysis (PEA) such as ORA or GSEA, we can also plot the results and add the information into the Figure legends.
Here we can for example use the results of the ORA we have performed on the differential expression results. Indeed for DMA_786M1A_vs_HK2 we have performed ORA on each cluster (consumed, released, consumed/released). Here, I will plot the ORA results of the metabolites that are released in both conditions, HK2 and 786-M1A.

#Prepare the Input:
#1. InputData=Pathway analysis input: Must have features as column names. Those feature names need to match features in the pathway analysis file SettingsFile_Metab.
InputPEA <- DMA_786M1A_vs_HK2 %>%
  filter(!is.na(KEGGCompound)) %>%
  column_to_rownames("KEGGCompound")

#2. InputData2=Pathway analysis output: Must have same column names as SettingsFile_Metab for Pathway name
InputPEA2 <- MC_ORA_786M1A_vs_HK2_Consumed %>%
  dplyr::rename("term"="ID")

#3. SettingsFile_Metab= Pathways used for pathway analysis: Must have same column names as SettingsFile_Metab for Pathway name and feature names need to match features in the InputData. PEA_Feature passes this column name!


MetaProViz::VizVolcano(PlotSettings="PEA",
                       SettingsInfo= c(PEA_Pathway="term",# Needs to be the same in both, SettingsFile_Metab and InputData2.
                                       PEA_stat="p.adjust",#Column InputData2
                                       PEA_score="GeneRatio",#Column InputData2
                                       PEA_Feature="Metabolite"),# Column SettingsFile_Metab (needs to be the same as row names in InputData)
                       SettingsFile_Metab= KEGG_Pathways,#Must be the pathways used for pathway analysis
                       InputData= InputPEA, #Must be the data you have used as an input for the pathway analysis
                       InputData2= InputPEA2, #Must be the results of the pathway analysis
                       x= "Log2(Distance)",
                       PlotName= "KEGG",
                       Subtitle= "PEA" ,
                       SelectLab = NULL)



Session information

#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
#>  [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
#> [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggfortify_0.4.17 rlang_1.1.4      tibble_3.2.1     dplyr_1.1.4      magrittr_2.0.3   MetaProViz_2.1.3
#> [7] ggplot2_3.5.1   
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.4.2           later_1.4.1             ggplotify_0.1.2         bitops_1.0-9           
#>   [5] R.oo_1.27.0             cellranger_1.1.0        polyclip_1.10-7         XML_3.99-0.17          
#>   [9] factoextra_1.0.7        lifecycle_1.0.4         rstatix_0.7.2           lattice_0.22-6         
#>  [13] vroom_1.6.5             MASS_7.3-61             backports_1.5.0         limma_3.58.1           
#>  [17] sass_0.4.9              rmarkdown_2.29          jquerylib_0.1.4         yaml_2.3.10            
#>  [21] zip_2.3.1               EnhancedVolcano_1.20.0  qcc_2.7                 RColorBrewer_1.1-3     
#>  [25] cowplot_1.1.3           DBI_1.2.3               lubridate_1.9.4         abind_1.4-8            
#>  [29] zlibbioc_1.48.2         rvest_1.0.4             purrr_1.0.2             R.utils_2.12.3         
#>  [33] ggraph_2.2.1            BiocGenerics_0.48.1     RCurl_1.98-1.16         hash_2.2.6.3           
#>  [37] yulab.utils_0.1.8       tweenr_2.0.3            rappdirs_0.3.3          GenomeInfoDbData_1.2.11
#>  [41] IRanges_2.36.0          S4Vectors_0.40.2        enrichplot_1.22.0       ggrepel_0.9.6          
#>  [45] tidytree_0.4.6          pheatmap_1.0.12         pkgdown_2.1.1           svglite_2.1.3          
#>  [49] codetools_0.2-20        DOSE_3.28.2             xml2_1.3.6              ggforce_0.4.2          
#>  [53] tidyselect_1.2.1        aplot_0.2.4             farver_2.1.2            viridis_0.6.5          
#>  [57] stats4_4.4.2            jsonlite_1.8.9          tidygraph_1.3.1         Formula_1.2-5          
#>  [61] systemfonts_1.1.0       tools_4.4.2             progress_1.2.3          treeio_1.26.0          
#>  [65] ragg_1.3.3              Rcpp_1.0.13-1           glue_1.8.0              gridExtra_2.3          
#>  [69] xfun_0.49               qvalue_2.34.0           GenomeInfoDb_1.38.8     withr_3.0.2            
#>  [73] fastmap_1.2.0           digest_0.6.37           gridGraphics_0.5-1      timechange_0.3.0       
#>  [77] R6_2.5.1                textshaping_0.4.1       colorspace_2.1-1        GO.db_3.18.0           
#>  [81] gtools_3.9.5            RSQLite_2.3.9           R.methodsS3_1.8.2       tidyr_1.3.1            
#>  [85] generics_0.1.3          data.table_1.16.4       graphlayouts_1.2.1      prettyunits_1.2.0      
#>  [89] httr_1.4.7              htmlwidgets_1.6.4       scatterpie_0.2.4        inflection_1.3.6       
#>  [93] pkgconfig_2.0.3         gtable_0.3.6            blob_1.2.4              XVector_0.42.0         
#>  [97] shadowtext_0.1.4        clusterProfiler_4.10.1  OmnipathR_3.15.2        htmltools_0.5.8.1      
#> [101] carData_3.0-5           fgsea_1.33.1            scales_1.3.0            kableExtra_1.4.0       
#> [105] Biobase_2.62.0          png_0.1-8               ggfun_0.1.8             knitr_1.49             
#> [109] rstudioapi_0.17.1       tzdb_0.4.0              reshape2_1.4.4          nlme_3.1-166           
#> [113] checkmate_2.3.2         curl_6.0.1              cachem_1.1.0            stringr_1.5.1          
#> [117] parallel_4.4.2          vipor_0.4.7             HDO.db_0.99.1           AnnotationDbi_1.64.1   
#> [121] desc_1.4.3              pillar_1.10.0           grid_4.4.2              logger_0.4.0           
#> [125] vctrs_0.6.5             ggpubr_0.6.0            car_3.1-3               beeswarm_0.4.0         
#> [129] evaluate_1.0.1          readr_2.1.5             cli_3.6.3               compiler_4.4.2         
#> [133] crayon_1.5.3            ggsignif_0.6.4          labeling_0.4.3          plyr_1.8.9             
#> [137] fs_1.6.5                ggbeeswarm_0.7.2        stringi_1.8.4           viridisLite_0.4.2      
#> [141] BiocParallel_1.36.0     munsell_0.5.1           Biostrings_2.70.3       lazyeval_0.2.2         
#> [145] GOSemSim_2.28.1         Matrix_1.7-1            hms_1.1.3               patchwork_1.3.0        
#> [149] bit64_4.5.2             KEGGREST_1.42.0         statmod_1.5.0           igraph_2.1.2           
#> [153] broom_1.0.7             memoise_2.0.1           bslib_0.8.0             ggtree_3.10.1          
#> [157] fastmatch_1.1-4         bit_4.5.0.1             readxl_1.4.3            gson_0.1.0             
#> [161] ape_5.8-1
Badia-I-Mompel, Pau, Jesús Vélez Santiago, Jana Braunger, Celina Geiss, Daniel Dimitrov, Sophia Müller-Dott, Petr Taus, et al. 2022. “decoupleR: Ensemble of Computational Methods to Infer Biological Activities from Omics Data.” Bioinformatics Advances 2 (1): vbac016. https://doi.org/10.1093/bioadv/vbac016.
Bijlsma, Sabina, Ivana Bobeldijk, Elwin R Verheij, Raymond Ramaker, Sunil Kochhar, Ian A Macdonald, Ben van Ommen, and Age K Smilde. 2006. “Large-Scale Human Metabolomics Studies: A Strategy for Data (Pre-) Processing and Validation.” Analytical Chemistry 78 (2): 567–74. https://doi.org/10.1021/ac051495j.
Hotelling, Harold. 1931. “The Generalization of Student’s Ratio.” The Annals of Mathematical Statistics 2 (3): 360–78. https://doi.org/10.1214/aoms/1177732979.
Kanehisa, M, and S Goto. 2000. “KEGG: Kyoto Encyclopedia of Genes and Genomes.” Nucleic Acids Research 28 (1): 27–30. https://doi.org/10.1093/nar/28.1.27.
Lord, Samuel J, Katrina B Velle, R Dyche Mullins, and Lillian K Fritz-Laylin. 2020. “SuperPlots: Communicating Reproducibility and Variability in Cell Biology.” The Journal of Cell Biology, no. 6 (June). https://doi.org/10.1083/jcb.202001064.
Wei, Runmin, Jingye Wang, Mingming Su, Erik Jia, Shaoqiu Chen, Tianlu Chen, and Yan Ni. 2018. “Missing Value Imputation Approach for Mass Spectrometry-Based Metabolomics Data.” Scientific Reports 8 (1): 663. https://doi.org/10.1038/s41598-017-19120-0.
Wulff, Jacob E., and Matthew W. Mitchell. 2018. “A Comparison of Various Normalization Methods for LC/MS Metabolomics Data.” Advances in Bioscience and Biotechnology 09 (08): 339–51. https://doi.org/10.4236/abb.2018.98022.