Implement trait prediction based on SNP and environmental data
with selection of prediction methods among Machine Learning approaches.
This function should be used to assess the predictive ability according to
a cross-validation scheme determined by the user.
predict_trait_MET_cv(
METData,
trait,
prediction_method,
lat_lon_included = F,
year_included = F,
cv_type = "cv0",
cv0_type = "leave-one-environment-out",
nb_folds_cv1 = 5,
repeats_cv1 = 50,
nb_folds_cv2 = 5,
repeats_cv2 = 50,
include_env_predictors = T,
list_env_predictors = NULL,
use_selected_markers = F,
list_selected_markers_manual = NULL,
seed = NULL,
save_splits = F,
save_processing = F,
path_folder,
save_model = F,
...
)
Arguments
METData |
list An object created by the initial function of the
package create_METData() .
|
trait |
character Name of the trait to predict.
|
prediction_method |
character specifying the predictive model to
use.
Options are currently xgb_reg_1 (gradient boosted trees), xgb_reg_2 ,
xgb_reg_3 , DL_reg_1 (multilayer perceptrons), DL_reg_2 , DL_reg_3 ,
stacking_reg_1 (stacked models), stacking_reg_2 , stacking_reg_3 ,
rf_reg_1 , rf_reg_2 , rf_reg_3 .
|
lat_lon_included |
logical indicates if longitude and latitude
data should be used as numeric predictors. Default is FALSE .
|
year_included |
logical indicates if year factor should be used
as predictor variable. Default is FALSE .
|
cv_type |
A character with one out of cv0 (prediction of new
environments), cv00 (prediction of new genotypes in new environments),
cv1 (prediction of new genotypes) or cv2 (prediction of incomplete
field trials). Default is cv0 . |
cv0_type |
A character with one out of
leave-one-environment-out , leave-one-site-out ,leave-one-year-out ,
forward-prediction . Default is leave-one-environment-out . |
nb_folds_cv1 |
A numeric Number of folds used in the CV1 scheme.
Default is 5. |
repeats_cv1 |
A numeric Number of repeats in the CV1 scheme.
Default is 50. |
nb_folds_cv2 |
A numeric Number of folds used in the CV2 scheme.
Default is 5. |
repeats_cv2 |
A numeric Number of repeats in the CV2 scheme.
Default is 50. |
include_env_predictors |
A logical indicating whether
environmental covariates characterizing each environment should be used in
predictions. |
list_env_predictors |
A character vector containing the names
of the environmental predictors which should be used in predictions.
By default NULL : all environmental predictors included in the
env_data table of the METData object will be used. |
use_selected_markers |
A Logical indicating whether to use a
subset of markers identified via single-environment GWAS or based on the
table of marker effects obtained via Elastic Net as predictor variables,
when main genetic effects are modeled with principal components.
If use_selected_markers is TRUE , and if list_selected_markers_manual
is NULL , then the select_markers() function will be called in the
pipeline.
For more details, see select_markers() |
seed |
integer Seed value. Default is NULL . By default, a
random seed will be generated.
|
save_splits |
A Logical to indicate if the train/test splits
should be saved. |
save_processing |
a logical indicating whether the processing
steps obtained from the get_splits_processed_with_method() functions
should be saved in a .RDS object. Default is FALSE . |
path_folder |
a character indicating the full path where the .RDS
object and plots generated during the analysis should be saved (do not use
a Slash after the name of the last folder). Default is NULL . |
save_model |
a logical indicating Logical indicating whether the
fitted model for each training-test partition should be saved. Default is
FALSE. Note that some models (e.g. stacked models) can require a large
memory. |
... |
Arguments passed to the get_splits_processed_with_method()
function. |
Value
A list
object of class met_cv
with the following items:
- list_results_cv
list
of res_fitted_split
elements.
The length of this list corresponds to the number of training/test set
partitions.
- seed_used
integer
Seed used to generate the
cross-validation splits.
- cv_type
integer
Seed used to generate the
cross-validation splits.
Author
Cathy C. Westhues cathy.jubin@uni-goettingen.de