This script performs metabolite clustering analysis and computes clusters of metabolites based on regulatory rules between Intracellular and culture media metabolomics (core experiment).
Source:R/MetaboliteClusteringAnalysis.R
mca_core.Rd
This script performs metabolite clustering analysis and computes clusters of metabolites based on regulatory rules between Intracellular and culture media metabolomics (core experiment).
Usage
mca_core(
data_intra,
data_core,
metadata_info_intra = c(ValueCol = "Log2FC", StatCol = "p.adj", cutoff_stat = 0.05,
ValueCutoff = 1),
metadata_info_core = c(DirectionCol = "core", ValueCol = "Log2(Distance)", StatCol =
"p.adj", cutoff_stat = 0.05, ValueCutoff = 1),
feature = "Metabolite",
save_table = "csv",
method_background = "Intra&core",
path = NULL
)
Arguments
- data_intra
DF for your data (results from e.g. dma) containing metabolites in rows with corresponding Log2FC and stat (p-value, p.adjusted) value columns.
- data_core
DF for your data (results from e.g. dma) containing metabolites in rows with corresponding Log2FC and stat (p-value, p.adjusted) value columns. Here we additionally require
- metadata_info_intra
Optional: Pass ColumnNames and Cutoffs for the intracellular metabolomics including the value column (e.g. Log2FC, Log2Diff, t.val, etc) and the stats column (e.g. p.adj, p.val). This must include: c(ValueCol=ColumnName_data_intra,StatCol=ColumnName_data_intra, cutoff_stat= NumericValue, ValueCutoff=NumericValue) Default=c(ValueCol="Log2FC",StatCol="p.adj", cutoff_stat= 0.05, ValueCutoff=1)
- metadata_info_core
Optional: Pass ColumnNames and Cutoffs for the consumption-release metabolomics including the direction column, the value column (e.g. Log2Diff, t.val, etc) and the stats column (e.g. p.adj, p.val). This must include: c(DirectionCol= ColumnName_data_core,ValueCol=ColumnName_data_core,StatCol=ColumnName_data_core, cutoff_stat= NumericValue, ValueCutoff=NumericValue)Default=c(DirectionCol="core", ValueCol="Log2(Distance)",StatCol="p.adj", cutoff_stat= 0.05, ValueCutoff=1)
- feature
Optional: Column name of Column including the Metabolite identifiers. This MUST BE THE SAME in each of your Input files. Default="Metabolite"
- save_table
Optional: File types for the analysis results are: "csv", "xlsx", "txt" default: "csv"
- method_background
Optional: Background method `Intra|core, Intra&core, core, Intra or * Default="Intra&core"
- path
Optional: Path to the folder the results should be saved at. default: NULL
Value
List of two DFs: 1. summary of the cluster count and 2. the detailed information of each metabolites in the clusters.
Examples
Media <- medium_raw %>%tibble::column_to_rownames("Code")
ResM <- MetaProViz::processing(data = Media[-c(40:45) ,-c(1:3)],
metadata_sample = Media[-c(40:45) ,c(1:3)] ,
metadata_info = c(Conditions = "Conditions", Biological_Replicates = "Biological_Replicates", core_norm_factor = "GrowthFactor", core_media = "blank"),
core=TRUE)
#> For Consumption Release experiment we are using the method from Jain M. REF: Jain et. al, (2012), Science 336(6084):1040-4, doi: 10.1126/science.1218595.
#> feature_filtering: Here we apply the modified 80%-filtering rule that takes the class information (Column `Conditions`) into account, which additionally reduces the effect of missing values (REF: Yang et. al., (2015), doi: 10.3389/fmolb.2015.00004). Filtering value selected: 0.8
#> 3 metabolites where removed: N-acetylaspartylglutamate, hypotaurine, S-(2-succinyl)cysteine
#> Missing Value Imputation: Missing value imputation is performed, as a complementary approach to address the missing value problem, where the missing values are imputing using the `half minimum value`. REF: Wei et. al., (2018), Reports, 8, 663, doi:https://doi.org/10.1038/s41598-017-19120-0
#> NA values were found in Control_media samples for metabolites. For metabolites including NAs mvi is performed unless all samples of a metabolite are NA.
#> Metabolites with high NA load (>20%) in Control_media samples are: dihydroorotate.
#> Metabolites with only NAs (=100%) in Control_media samples are: hydroxyphenylpyruvate. Those NAs are set zero as we consider them true zeros
#> total Ion Count (tic) normalization: total Ion Count (tic) normalization is used to reduce the variation from non-biological sources, while maintaining the biological variation. REF: Wulff et. al., (2018), Advances in Bioscience and Biotechnology, 9, 339-351, doi:https://doi.org/10.4236/abb.2018.98022
#> 8 of variables have high variability (CV > 30) in the core_media control samples. Consider checking the pooled samples to decide whether to remove these metabolites or not.
#> Warning: The core_media samples MS51-06 were found to be different from the rest. They will not be included in the sum of the core_media samples.
#> core data are normalised by substracting mean (blank) from each sample and multiplying with the core_norm_factor
#> Outlier detection: Identification of outlier samples is performed using Hotellin's T2 test to define sample outliers in a mathematical way (Confidence = 0.99 ~ p.val < 0.01) (REF: Hotelling, H. (1931), Annals of Mathematical Statistics. 2 (3), 360-378, doi:https://doi.org/10.1214/aoms/1177732979). hotellins_confidence value selected: 0.99
#> There are possible outlier samples in the data
#> Filtering round 1 Outlier Samples: MS51-06
#> Filtering round 2 Outlier Samples: MS51-09
MediaDMA <- MetaProViz::dma(data=ResM[["DF"]][["Preprocessing_output"]][ ,-c(1:4)],
metadata_sample=ResM[["DF"]][["Preprocessing_output"]][ , c(1:4)],
metadata_info = c(Conditions = "Conditions", Numerator = NULL, Denominator = "HK2"),
pval ="aov",
core=TRUE)
#> There are no NA/0 values
#> For the condition HK2 75.71 % of the metabolites follow a normal distribution and 24.29 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-O 95.83 % of the metabolites follow a normal distribution and 4.17 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-M1A 97.22 % of the metabolites follow a normal distribution and 2.78 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition 786-M2A 88.89 % of the metabolites follow a normal distribution and 11.11 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition OSRC2 93.06 % of the metabolites follow a normal distribution and 6.94 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition OSLM1B 86.11 % of the metabolites follow a normal distribution and 13.89 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For the condition RFX631 97.22 % of the metabolites follow a normal distribution and 2.78 % of the metabolites are not-normally distributed according to the shapiro test. You have chosen aov, which is for parametric Hypothesis testing. `shapiro.test` ignores missing values in the calculation.
#> For 62.86% of metabolites the group variances are equal.
#> No condition was specified as numerator and HK2 was selected as a denominator. Performing multiple testing `all-vs-one` using aov.
IntraDMA <- intracell_raw %>%tibble::column_to_rownames("Code")
Res <- MetaProViz::mca_core(data_intra = IntraDMA%>%tibble::rownames_to_column("Metabolite"),
data_core = MediaDMA[["dma"]][["786-M1A_vs_HK2"]])
#> Error in check_param_mca(data_c1 = NULL, data_c2 = NULL, data_core = data_core, data_intra = data_intra, metadata_info_c1 = NULL, metadata_info_c2 = NULL, metadata_info_core = metadata_info_core, metadata_info_intra = metadata_info_intra, method_background = method_background, feature = feature, save_table = save_table): The Log2FC column selected as ValueCol in metadata_info_intra was not found in data_intra. Please check your input.