Train MISTy models

Trains multi-view models for all target markers, estimates the performance, the contributions of the view specific models and the importance of predictor markers for each target marker.

Usage

run_misty(
  views,
  results.folder = "results",
  seed = 42,
  target.subset = NULL,
  bypass.intra = FALSE,
  cv.folds = 10,
  cached = FALSE,
  append = FALSE,
  model.function = random_forest_model,
  ...
)

Arguments

views: view composition.
results.folder: path to the top level folder to store raw results.
seed: seed used for random sampling to ensure reproducibility.
target.subset: subset of targets to train models for. If NULL, models will be trained for markers in the intraview.
bypass.intra: a logical indicating whether to train a baseline model using the intraview data (see Details).
cv.folds: number of cross-validation folds to consider for estimating the performance of the multi-view models
cached: a logical indicating whether to cache the trained models and to reuse previously cached ones if they already exist for this sample.
append: a logical indicating whether to append the performance and coefficient files in the results.folder. Consider setting to TRUE when rerunning a workflow with different target.subset parameters.
model.function: a function which is used to model each view, default model is random_forest_model. Other models included in mistyR are gradient_boosting_model, bagged_mars_model, mars_model, linear_model, svm_model, mlp_model
...: all additional parameters are passed to the chosen ML model for training the view-specific models

Value

Path to the results folder that can be passed to

collect_results().

Details

If bypass.intra is set to TRUE all variable in the intraview the intraview data will be treated as targets only. The baseline intraview model in this case is a trivial model that predicts the average of each target. If the intraview has only one variable this switch is automatically set to TRUE.

Default model to train the view-specific views is a Random Forest model based on ranger() -- run_misty(views, model.function = random_forest_model)

The following parameters are the default configuration: num.trees = 100, importance = "impurity", num.threads = 1, seed = seed.

Gradient boosting is an alternative to model each view using gradient boosting. The algorithm is based on xgb.train() -- run_misty(views, model.function = gradient_boosting_model)

The following parameters are the default configuration: booster = "gbtree", rounds = 10, objective = "reg:squarederror". Set booster to "gblinear" for linear boosting.

Bagged MARS is an alternative to model each view using bagged MARS, (multivariate adaptive spline regression models) trained with bootstrap aggregation samples. The algorithm is based on earth() -- run_misty(views, model.function = bagged_mars_model)

The following parameters are the default configuration: degree = 2. Furthermore 50 base learners are used by default (pass n.bags as parameter via ... to change this value).

MARS is an alternative to model each view using multivariate adaptive spline regression model. The algorithm is based on earth() -- run_misty(views, model.function = mars_model)

The following parameters are the default configuration: degree = 2.

Linear model is an alternative to model each view using a simple linear model. The algorithm is based on lm() -- run_misty(views, model.function = linear_model)

SVM is an alternative to model each view using a support vector machines. The algorithm is based on ksvm() -- run_misty(views, model.function = svm_model)

The following parameters are the default configuration: kernel = "vanilladot" (linear kernel), C = 1, type = "eps-svr".

MLP is an alternative to model each view using a multi-layer perceptron. The alogorithm is based on mlp() -- run_misty(views, model.function = mlp_model)

The following parameters are the default configuration: size = c(10) (meaning we have 1 hidden layer with 10 units).

Examples

# Create a view composition of an intraview and a paraview with radius 10 then
# run MISTy for a single sample.

library(dplyr)

# get the expression data
data("synthetic")
expr <- synthetic[[1]] %>% select(-c(row, col, type))
# get the coordinates for each cell
pos <- synthetic[[1]] %>% select(row, col)

# compose
misty.views <- create_initial_view(expr) %>% add_paraview(pos, l = 10)
#> 
#> Generating paraview

# run with default parameters
run_misty(misty.views)
#> 
#> Training models
#> [1] "/tmp/RtmpcrfSZ8/file2484477ca03e/reference/results"

Usage

Arguments

Value

Details

See also

Examples