This documentation is based on the scripts and data files provided in the PHONEMeS Main Example folder.
The documentation is based on PHONEMeS 0.2.3, run locally using R 3.1.0 (in RStudio) and on a cluster using R 2.15.2 and LSF 8.
The result networks are visualized using Cytoscape 2.8.0 (although we are working on a plugin for Cytoscape 3 to automatically import the contents of a typical PHONEMeS results folder and to produce annotated networks).
The untargeted phosphoproteomics data used for the identification of the signaling models in PHONEMeS is a data matrix containing the peak heights for each of the measured peptides accross each condition (treatments and control) and replicates and which maps to specific sites over a certain number of proteins. If the data is not normalized, we quantile normalize the log10 of the raw intensity values and then use a linear model to estimate the effects of each drug over a control state by computing the log fold change between each point of these two conditions and estimating their significance using a t-statistic as computed by the function eBayes implemented by the Bioconductor package limma
As a second step, we fit a Gaussian Mixture Model (GMM) on each peptide accross each condition and select only those peptides whose distributions are better described with 2 components, hence showing a Boolean behaviour under each different condition (perturbed/non-perturbed). For that we can use the Bioconductor package mclust which runs an expectation-maximization method to fit the measurements into mixed Gaussian distributions. Furthermore, we exclude those cases in which the two density curves overlap by more than 10% over a range covering the whole data in order to avoid cases where two conditions are found because of a tightly clustered set of measurements. Then for each point, we associate the score S as the log ratio of the probabilty of belonging to either a control or perturbed state:
Peptide i is called perturbed under condition j if this score is below the -0.5 value, while for values higher than 0.5 the peptide is considered to be in the control state. Values in between -0.5 and 0.5 are considered as undetermined.
prepOptim.R and produce
n denotes the index of the independent optimization run.
prepOptim.R takes a network object and a data object (resulting from data normalization, summarization, and Gaussian mixture modeling) in order to produce the objects needed to run PHONEMeS on a cluster.
Copy the prepared data as well as the cluster scripts (which can be found here) to the cluster:
Please note that the scripts may require minor changes (e.g. modifications of the paths in the scripts).
Make sure that PHONEMeS is installed on your cluster. If this is not the case, please refer to the installation guide.
Additionally, each independent optimization run requires a result folder with the index
n of the optimization run.
This step should be repeated multiple times with different indices. The script
runScriptGx.sh will submit the necessary jobs to the queue of your LSF system. Make sure to modify the script as necessary (e.g. correcting the queue name).
Copy back the following files back to your local machine in order to process, combine, and visualize the results of the PHONEMeS analysis:
The R script
postOptim.R will process the results of a single optimization, so that it should be run for each independent optimization run. The file may require minor modifications regarding file location.
postOptim.R has been run on all independent optimization and produced the different files
objects_pn.RData, run the R script
comb_optim.R. It will output the combined plots as well as the final resulting network (maximal input averaged frequencies across independent optimizations, maximal scoring paths by averaged frequencies, etc.).
Start Cytoscape and select “From Network File…”.
nTagMaxIn_comb.txt and set the columns
K.ID as source and
S.cc as target.
SIDas Key and the column
PHONEMeS_vizmap.propsin the PHONEMeS repository (here).