# Usage

This documentation is based on the scripts and data files provided in the PHONEMeS Main Example folder.

The documentation is based on PHONEMeS 0.2.3, run locally using R 3.1.0 (in RStudio) and on a cluster using R 2.15.2 and LSF 8.

These scripts require the igraph package (version 0.7.1) and the BioNet package (version 1.24.0).

The result networks are visualized using Cytoscape 2.8.0 (although we are working on a plugin for Cytoscape 3 to automatically import the contents of a typical PHONEMeS results folder and to produce annotated networks).

# I. Data Preparation

The untargeted phosphoproteomics data used for the identification of the signaling models in PHONEMeS is a data matrix containing the peak heights for each of the measured peptides accross each condition (treatments and control) and replicates and which maps to specific sites over a certain number of proteins. If the data is not normalized, we quantile normalize the log10 of the raw intensity values and then use a linear model to estimate the effects of each drug over a control state by computing the log fold change between each point of these two conditions and estimating their significance using a t-statistic as computed by the function eBayes implemented by the Bioconductor package limma

As a second step, we fit a Gaussian Mixture Model (GMM) on each peptide accross each condition and select only those peptides whose distributions are better described with 2 components, hence showing a Boolean behaviour under each different condition (perturbed/non-perturbed). For that we can use the Bioconductor package mclust which runs an expectation-maximization method to fit the measurements into mixed Gaussian distributions. Furthermore, we exclude those cases in which the two density curves overlap by more than 10% over a range covering the whole data in order to avoid cases where two conditions are found because of a tightly clustered set of measurements. Then for each point, we associate the score S as the log ratio of the probabilty of belonging to either a control or perturbed state:

Peptide i is called perturbed under condition j if this score is below the -0.5 value, while for values higher than 0.5 the peptide is considered to be in the control state. Values in between -0.5 and 0.5 are considered as undetermined.

# II. Running PHONEMeS on a Cluster

### 1. Prepare the data for the cluster

Run prepOptim.R and produce data4cluster_n.RData. Here, n denotes the index of the independent optimization run. prepOptim.R takes a network object and a data object (resulting from data normalization, summarization, and Gaussian mixture modeling) in order to produce the objects needed to run PHONEMeS on a cluster.

### 2. Move to cluster

Copy the prepared data as well as the cluster scripts (which can be found here) to the cluster:

• data4cluster_n.RData
• processGx.R
• runScriptGx.sh
• scriptGxopt_50models.R
• import.R

Please note that the scripts may require minor changes (e.g. modifications of the paths in the scripts).

### 3. Is the package installed?

Make sure that PHONEMeS is installed on your cluster. If this is not the case, please refer to the installation guide.

### 4. Create results directory

Additionally, each independent optimization run requires a result folder with the index n of the optimization run.

mkdir Results_n


### 5. Run one independent optimization

This step should be repeated multiple times with different indices. The script runScriptGx.sh will submit the necessary jobs to the queue of your LSF system. Make sure to modify the script as necessary (e.g. correcting the queue name).

./runScriptGx.sh n


### 6. Move results to your local machine

Copy back the following files back to your local machine in order to process, combine, and visualize the results of the PHONEMeS analysis:

• pn_imported.RData
• optim_n.pdf

# III. Post-Optimization Analysis

### Process each independent optimization

The R script postOptim.R will process the results of a single optimization, so that it should be run for each independent optimization run. The file may require minor modifications regarding file location.

### Combine multiple independent optimizations

Once postOptim.R has been run on all independent optimization and produced the different files objects_pn.RData, run the R script comb_optim.R. It will output the combined plots as well as the final resulting network (maximal input averaged frequencies across independent optimizations, maximal scoring paths by averaged frequencies, etc.).

# IV. Visualize the Result Networks

The resulting .txt-files (either individual ones from single or combined optimizations) can be imported as tables into cytoscape.

1. Start Cytoscape and select “From Network File…”.

2. Select nTagMaxIn_comb.txt and set the columns K.ID as source and S.cc as target.

3. Import edge frequencies:
• “Import Table from File”: Select combOptim_EA.txt.
• Select “Import Data as: Edge Table Column” and “Key Column for Network: SID”.
• Select the columns SID as Key and the column f50 as Attribute.
4. Import Nodes Attributes:
• “Import Table from File”: Select AllNodes_nodesP_NA_pn.txt.
• Select “Import Data as: Node Table Column”.
5. Import visual properties:
• “Import Styles”: Select PHONEMeS_vizmap.props in the PHONEMeS repository (here).
• Finally, select the PHONEMeS style in the “Style” tab.