Implement trait prediction based on SNP and environmental data
with selection of prediction methods among Machine Learning approaches.
This function should be used to assess the predictive ability according to
a cross-validation scheme determined by the user.
predict_trait_MET(
METData_training,
METData_new,
trait,
prediction_method,
use_selected_markers = F,
list_selected_markers_manual = NULL,
lat_lon_included = F,
year_included = F,
include_env_predictors = T,
list_env_predictors = NULL,
seed = NULL,
save_processing = T,
path_folder,
save_model = F,
...
)
Arguments
METData_training |
list An object created by the function
create_METData() that contains the training set.
@param METData_new list An object created by the function
create_METData() that contains the test set (no phenotypic observations). |
trait |
character Name of the trait to predict. An ordinal trait
should be encoded as integer .
|
prediction_method |
character specifying the predictive model to use.
Options are currently xgb_reg_1 (gradient boosted trees), xgb_reg_2 ,
xgb_reg_3 , DL_reg_1 (multilayer perceptrons), DL_reg_2 , DL_reg_3 ,
stacking_reg_1 (stacked models), stacking_reg_2 , stacking_reg_3 ,
rf_reg_1 , rf_reg_2 , rf_reg_3 .
|
use_selected_markers |
A Logical indicating whether to use a
subset of markers obtained from a previous step
(see function select_markers() ). |
lat_lon_included |
logical indicates if longitude and latitude
data should be used as numeric predictors. Default is TRUE .
|
year_included |
logical indicates if year factor should be used
as predictor variable. Default is FALSE .
|
include_env_predictors |
A logical indicating whether
environmental covariates characterizing each environment should be used in
predictions. |
list_env_predictors |
A character vector containing the names
of the environmental predictors which should be used in predictions. By
default NULL : all environmental predictors included in the env_data table
of the METData object will be used. |
seed |
integer Seed value. Default is NULL . By default, a
random seed will be generated.
|
save_processing |
a logical indicating whether the processing
steps obtained from the processing_train_test_split() or
processing_train_test_split_kernel() functions should be saved in a .RDS
object. Default is FALSE . |
path_folder |
a character indicating the full path where the .RDS
object and plots generated during the analysis should be saved (do not use
a Slash after the name of the last folder). Default is NULL . |
save_model |
a logical indicating Logical indicating whether the
fitted model for each training-test partition should be saved. Default is
FALSE. Note that some models (e.g. stacked models) can require a large
memory. |
... |
Arguments passed to the processing_train_test_split() ,
processing_train_test_split_kernel() , reg_fitting_train_test_split() ,
reg_fitting_train_test_split_kernel() functions. |
cv_type |
A character with one out of cv0 (prediction of new
environments), cv00 (prediction of new genotypes in new environments),
cv1 (prediction of new genotypes) or cv2 (prediction of incomplete
field trials). Default is cv0 . |
cv0_type |
A character with one out of
leave-one-environment-out , leave-one-site-out ,leave-one-year-out ,
forward-prediction . Default is leave-one-environment-out . |
nb_folds_cv1 |
A numeric Number of folds used in the CV1 scheme.
Default is 5. |
repeats_cv1 |
A numeric Number of repeats in the CV1 scheme.
Default is 50. |
nb_folds_cv2 |
A numeric Number of folds used in the CV2 scheme.
Default is 5. |
repeats_cv2 |
A numeric Number of repeats in the CV2 scheme.
Default is 50. |
Value
A list
object of class met_cv
with the following items:
- list_results_cv
list
of res_fitted_split
elements.
Detailed prediction results for each split of the
data within each element of this list.
- seed_used
integer
Seed used to generate the
cross-validation splits.
Author
Cathy C. Westhues cathy.jubin@uni-goettingen.de