vignettes/vignette_models_package.Rmd
vignette_models_package.Rmd
The different Machine Learning-based methods avaialble in the package
are presented in the table below. The first column refers to the name of
the method as it should be given in the prediction_method
argument of the predict_trait_MET_cv()
function to be used
in Step 2.
The second column indicates how many From the third column on, different
data subsets (= sub-sampling of features) are used and fitted on the
training set. Some models, such as stacking ensembles, use a combination
of base learners, which are individually fitted on different data
subsets.
The type of predictive modeling approach used (e.g. tree-based methods
such as gradient boosted trees; support vector machines; multilayer
perceptrons) to fit each data subset is precised in the table.
For example, the model stacking_reg_1
uses as base learners
two SVM models: one is fitted on the training data sub-sampled for
marker data, and the second one is fitted on the training data
subsampled for environmental data. A meta-learner (LASSO model) is used
to determine the weight of the predictions from the respective base
learners. Hence, the final model is based on the
stacking of two base models fitted on the same training
set, but sub-sampled with different predictor variables.
The suffix reg
refers to the fact that the method should be
used for regression tasks.
Name of the prediction_method in step 2 | Genomic PCs derived from genotype matrix + environmental predictor variables | Genomic PCs derived from genomic relationship matrix + environmental predictor variables | All SNPs predictor variables + environmental predictor variables | Only molecular marker predictor variables | Only environmental predictor variables | GxE interaction dataset = QTLs with environmental variables | GxE interaction dataset = Principal components (from geno matrix) with environmental variables | Stacking model = combination of models |
---|---|---|---|---|---|---|---|---|
xgb_reg_1 | X | |||||||
xgb_reg_2 | X | |||||||
xgb_reg_3 | X | |||||||
rf_reg_1 | X | |||||||
rf_reg_2 | X | |||||||
rf_reg_3 | X | |||||||
stacking_reg_1 | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X | |||||
stacking_reg_2 | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X | ||||
stacking_reg_3 | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X (support vector machine. Linear, Radial Basis Function (RBF) Kernel, or Polynomial kernel) | X | ||||
DL_reg_1 | X | |||||||
DL_reg_2 | X | |||||||
DL_reg_3 | X |
Kuhn and Wickham (2020) Friedman (2001) Breiman (2001) Chen et al. (2015) Van der Laan, Polley, and Hubbard (2007)