Skip to contents

This script performs metabolite clustering analysis and computes clusters of metabolites based on regulatory rules between Intracellular and culture media metabolomics (core experiment).

Usage

mca_core(
  data_intra,
  data_core,
  metadata_info_intra = c(ValueCol = "Log2FC", StatCol = "p.adj", cutoff_stat = 0.05,
    ValueCutoff = 1),
  metadata_info_core = c(DirectionCol = "core", ValueCol = "Log2(Distance)", StatCol =
    "p.adj", cutoff_stat = 0.05, ValueCutoff = 1),
  feature = "Metabolite",
  save_table = "csv",
  method_background = "Intra&core",
  path = NULL
)

Arguments

data_intra

DF for your data (results from e.g. dma) containing metabolites in rows with corresponding Log2FC and stat (p-value, p.adjusted) value columns.

data_core

DF for your data (results from e.g. dma) containing metabolites in rows with corresponding Log2FC and stat (p-value, p.adjusted) value columns. Here we additionally require

metadata_info_intra

Optional: Pass ColumnNames and Cutoffs for the intracellular metabolomics including the value column (e.g. Log2FC, Log2Diff, t.val, etc) and the stats column (e.g. p.adj, p.val). This must include: c(ValueCol=ColumnName_data_intra,StatCol=ColumnName_data_intra, cutoff_stat= NumericValue, ValueCutoff=NumericValue) Default=c(ValueCol="Log2FC",StatCol="p.adj", cutoff_stat= 0.05, ValueCutoff=1)

metadata_info_core

Optional: Pass ColumnNames and Cutoffs for the consumption-release metabolomics including the direction column, the value column (e.g. Log2Diff, t.val, etc) and the stats column (e.g. p.adj, p.val). This must include: c(DirectionCol= ColumnName_data_core,ValueCol=ColumnName_data_core,StatCol=ColumnName_data_core, cutoff_stat= NumericValue, ValueCutoff=NumericValue)Default=c(DirectionCol="core", ValueCol="Log2(Distance)",StatCol="p.adj", cutoff_stat= 0.05, ValueCutoff=1)

feature

Optional: Column name of Column including the Metabolite identifiers. This MUST BE THE SAME in each of your Input files. Default="Metabolite"

save_table

Optional: File types for the analysis results are: "csv", "xlsx", "txt" default: "csv"

method_background

Optional: Background method `Intra|core, Intra&core, core, Intra or * Default="Intra&core"

path

Optional: Path to the folder the results should be saved at. default: NULL

Value

List of two DFs: 1. summary of the cluster count and 2. the detailed information of each metabolites in the clusters.

Examples


Media <- medium_raw %>%tibble::column_to_rownames("Code")
ResM <- MetaProViz::processing(data = Media[-c(40:45) ,-c(1:3)],
                                  metadata_sample = Media[-c(40:45) ,c(1:3)] ,
                                  metadata_info = c(Conditions = "Conditions", Biological_Replicates = "Biological_Replicates", core_norm_factor = "GrowthFactor", core_media = "blank"),
                                  core=TRUE)
#> For Consumption Release experiment we are using the method from Jain M.  REF: Jain et. al, (2012), Science 336(6084):1040-4, doi: 10.1126/science.1218595.
#> feature_filtering: Here we apply the modified 80%-filtering rule that takes the class information (Column `Conditions`) into account, which additionally reduces the effect of missing values (REF: Yang et. al., (2015), doi: 10.3389/fmolb.2015.00004). Filtering value selected: 0.8
#> 3 metabolites where removed: N-acetylaspartylglutamate, hypotaurine, S-(2-succinyl)cysteine
#> Missing Value Imputation: Missing value imputation is performed, as a complementary approach to address the missing value problem, where the missing values are imputing using the `half minimum value`. REF: Wei et. al., (2018), Reports, 8, 663, doi:https://doi.org/10.1038/s41598-017-19120-0
#> NA values were found in Control_media samples for metabolites. For metabolites including NAs mvi is performed unless all samples of a metabolite are NA.
#> Metabolites with high NA load (>20%) in Control_media samples are: dihydroorotate.
#> Metabolites with only NAs (=100%) in Control_media samples are: hydroxyphenylpyruvate. Those NAs are set zero as we consider them true zeros
#> total Ion Count (tic) normalization: total Ion Count (tic) normalization is used to reduce the variation from non-biological sources, while maintaining the biological variation. REF: Wulff et. al., (2018), Advances in Bioscience and Biotechnology, 9, 339-351, doi:https://doi.org/10.4236/abb.2018.98022
#> 8 of variables have high variability (CV > 30) in the core_media control samples. Consider checking the pooled samples to decide whether to remove these metabolites or not.
#> Warning: The core_media samples  MS51-06  were found to be different from the rest. They will not be included in the sum of the core_media samples.
#> core data are normalised by substracting mean (blank) from each sample and multiplying with the core_norm_factor
#> Outlier detection: Identification of outlier samples is performed using Hotellin's T2 test to define sample outliers in a mathematical way (Confidence = 0.99 ~ p.val < 0.01) (REF: Hotelling, H. (1931), Annals of Mathematical Statistics. 2 (3), 360-378, doi:https://doi.org/10.1214/aoms/1177732979). hotellins_confidence value selected: 0.99
#> There are possible outlier samples in the data
#> Filtering round  1  Outlier Samples:  MS51-06  
#> Filtering round  2  Outlier Samples:  MS51-09  

















MediaDMA <- MetaProViz::dma(data=ResM[["DF"]][["Preprocessing_output"]][ ,-c(1:4)],
                            metadata_sample=ResM[["DF"]][["Preprocessing_output"]][ , c(1:4)],
                            metadata_info = c(Conditions = "Conditions", Numerator = NULL, Denominator  = "HK2"),
                            pval ="aov",
                            core=TRUE)
#> There are no NA/0 values
#> For the condition HK2 75.71 % of the metabolites follow a normal distribution and 24.29 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-O 95.83 % of the metabolites follow a normal distribution and 4.17 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-M1A 97.22 % of the metabolites follow a normal distribution and 2.78 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-M2A 88.89 % of the metabolites follow a normal distribution and 11.11 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition OSRC2 93.06 % of the metabolites follow a normal distribution and 6.94 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition OSLM1B 86.11 % of the metabolites follow a normal distribution and 13.89 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition RFX631 97.22 % of the metabolites follow a normal distribution and 2.78 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For 62.86% of metabolites the group variances are equal.
#> No condition was specified as numerator and HK2 was selected as a denominator. Performing multiple testing `all-vs-one` using aov.















IntraDMA <- intracell_raw %>%tibble::column_to_rownames("Code")

Res <- MetaProViz::mca_core(data_intra = IntraDMA%>%tibble::rownames_to_column("Metabolite"),
                            data_core = MediaDMA[["dma"]][["786-M1A_vs_HK2"]])
#> Error in check_param_mca(data_c1 = NULL, data_c2 = NULL, data_core = data_core,     data_intra = data_intra, metadata_info_c1 = NULL, metadata_info_c2 = NULL,     metadata_info_core = metadata_info_core, metadata_info_intra = metadata_info_intra,     method_background = method_background, feature = feature,     save_table = save_table): The Log2FC column selected as ValueCol in metadata_info_intra was not found in data_intra. Please check your input.