Step 1: create METData object

Specify input data and processing parameters (automatic retrieval of external weather data; QC on raw weather data, if provided…)

Main function

new_create_METData() create_METData() validate_create_METData()

Create a multi-environment trials data object

Get daily weather data for an environment based on geographical coordinates

get_daily_tables_per_env()

Obtain daily climate data for an environment from NASA POWER data.

daylength()

Compute the day length given the altitude and day of year.

sat_vap_pressure()

Formulas to compute saturated vapor pressure deficit

get.ea()

Formulas to compute vapour pressure deficit according to available data

get.ea.with.rhmean()

Formulas to compute vapour pressure deficit according to available data

get.ea.no.RH()

Formulas to compute vapour pressure deficit according to available data

get.es()

Formulas to compute vapour pressure deficit according to available data

get.esmn()

Formulas to compute vapour pressure deficit according to available data

get.esmx()

Formulas to compute vapour pressure deficit according to available data

get_soil_per_env()

Obtain soil data for a given environment

penman_monteith_reference_et0()

Calculates reference ET0 based on the Penman-Monteith model (FAO-56 Method)

Check daily weather data (non-exhaustive quality control) provided by user

qc_raw_weather_data()

Quality control on daily weather data

Compute environmental covariates based on raw daily weather data

get_ECs()

Compute environmental covariates for each environment of the MET dataset.

compute_EC_fixed_length_window()

Compute ECs based on day-windows of fixed length.

compute_EC_fixed_number_windows()

Compute ECs based on a fixed number of day-windows (fixed number across all environments).

compute_EC_user_defined_intervals()

Compute ECs based on day-windows of fixed length.

compute_EC_gdd()

Compute ECs based on growth stages which are estimated based on accumulated GDD in each environment.

gdd_information()

Internal function of [compute_EC_gdd())] [compute_EC_gdd())]: R:compute_EC_gdd())

get_solar_radiation()

Obtain daily solar radiation for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.

get_wind_data()

Obtain daily wind data for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.

get_elevation()

Obtain elevation data for each field trial based on longitude and latitude data

Clustering of environments based on weather data from the complete training dataset

clustering_env_data()

Clustering of environments solely based on environmental information

Overview of the METData object created

summary(<METData>)

Summary of an object of class METData

print(<summary.METData>)

Print the summary of an object of class METData

Step 2: cross-validated model evaluation of the METData

Evaluate predictive ability of a machine learning-based model with a specific CV scheme

Main function

predict_trait_MET_cv()

Cross-validation procedure for phenotypic prediction of crop varieties.

Create train/test splits to address typical prediction problems for MET datasets

predict_cv0()

Get train/test splits of the phenotypic MET dataset based on CV0.

predict_cv00()

Get train/test splits of the phenotypic MET dataset based on CV0.

predict_cv1()

Get train/test splits of the phenotypic MET dataset based on CV1.

predict_cv2()

Get train/test splits of the phenotypic MET dataset based on CV2.

Processing of genotypic data for ML-based predictions

apply_pca()

Data dimensionality reduction using PCA on a split object.

apply_pcs_G_Add()

Data dimensionality reduction by modeling genetic effects using the PCs of the genomic relationship matrix.

select_markers()

Selection of specific SNPs covariates.

marker_effect_per_env_EN()

Compute marker effects per environment with Elastic Net

marker_effect_per_env_FarmCPU()

Compute marker P-values for each environment with GWAS.

ML-methods implemented: processing functions according to the method

get_splits_processed_with_method()

Attribute a processing method for each list of training/test splits

new_stacking_reg_1() stacking_reg_1() validate_stacking_reg_1()

Processing of a split object to get data ready to be used and fitted with a stacking_reg_1 (stacking of SVM models) regression model.

new_stacking_reg_2() stacking_reg_2() validate_stacking_reg_2()

Processing of a split object to get data ready to be used and fitted with a stacking_reg_2 (stacking of SVM models) regression model.

new_stacking_reg_3() stacking_reg_3() validate_stacking_reg_3()

Processing of a split object to get data ready to be used and fitted with a stacking_reg_3 (stacking of SVM models) regression model.

new_xgb_reg_1() xgb_reg_1() validate_xgb_reg_1()

Processing of a split object to get data ready to be used and fitted with a xgb_reg_1 (gradient boosted tree) regression model.

new_xgb_reg_2() xgb_reg_2() validate_xgb_reg_2()

Processing of a split object to get data ready to be used and fitted with a xgb_reg_2 (gradient boosted tree) regression model.

new_xgb_reg_3() xgb_reg_3() validate_xgb_reg_3()

Processing of a split object to get data ready to be used and fitted with a xgb_reg_3 (gradient boosted tree) regression model.

new_DL_reg_1() DL_reg_1() validate_DL_reg_1()

Processing of a split object to get data ready to be used and fitted with a DL_reg_1 (neural network) regression model.

new_DL_reg_2() DL_reg_2() validate_DL_reg_2()

Processing of a split object to get data ready to be used and fitted with a DL_reg_2 (neural network) regression model.

new_DL_reg_3() DL_reg_3() validate_DL_reg_3()

Processing of a split object to get data ready to be used and fitted with a DL_reg_3 (neural network) regression model.

new_rf_reg_1() rf_reg_1() validate_rf_reg_1()

Processing of a split object to get data ready to be used and fitted with a rf_reg_1 (random forest) regression model.

new_rf_reg_2() rf_reg_2() validate_rf_reg_2()

Processing of a split object to get data ready to be used and fitted with a rf_reg_2 (random forest) regression model.

new_rf_reg_3() rf_reg_3() validate_rf_reg_3()

Processing of a split object to get data ready to be used and fitted with a rf_reg_3 (random forest) regression model.

ML-methods implemented: fitting functions according to the method

fit_cv_split()

S3 method used to fit an object of class rf_reg_1, rf_reg_2, rf_reg_3, xgb_reg_1, xgb_reg_2, xgb_reg_3,DL_reg,DL_reg_1, DL_reg_2,DL_reg_3,stacking_reg_1, stacking_reg_2 or stacking_reg_3.

Compute variable importance (model-specific and model-free, e.g. permutation-based methods)

variable_importance_split()

Compute variable importance according to the machine learning algorithm used

Plot cross-validated results for predictive ability

plot_results_cv()

Plot cross-validated results for the ML model and the trait under study.

Plot variable importance results

plot_results_vip_cv()

Plot variable importance scores

Step 3: Create a table of new phenotypes to predict (i.e. for a set of given genotypes in a given environment)

Main function

new_create_METData() create_METData() validate_create_METData()

Create a multi-environment trials data object

Get daily weather data for an environment based on geographical coordinates

get_daily_tables_per_env()

Obtain daily climate data for an environment from NASA POWER data.

daylength()

Compute the day length given the altitude and day of year.

get.ea()

Formulas to compute vapour pressure deficit according to available data

get.ea.with.rhmean()

Formulas to compute vapour pressure deficit according to available data

get.ea.no.RH()

Formulas to compute vapour pressure deficit according to available data

get.ea.with.rhmax()

Formulas to compute vapour pressure deficit according to available data

get.es()

Formulas to compute vapour pressure deficit according to available data

get.esmn()

Formulas to compute vapour pressure deficit according to available data

get.esmx()

Formulas to compute vapour pressure deficit according to available data

Check daily weather data (non-exhaustive quality control) provided by user

qc_raw_weather_data()

Quality control on daily weather data

Compute environmental covariates based on raw daily weather data

get_ECs()

Compute environmental covariates for each environment of the MET dataset.

compute_EC_fixed_length_window()

Compute ECs based on day-windows of fixed length.

compute_EC_fixed_number_windows()

Compute ECs based on a fixed number of day-windows (fixed number across all environments).

compute_EC_gdd()

Compute ECs based on growth stages which are estimated based on accumulated GDD in each environment.

gdd_information()

Internal function of [compute_EC_gdd())] [compute_EC_gdd())]: R:compute_EC_gdd())

get_solar_radiation()

Obtain daily solar radiation for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.

get_wind_data()

Obtain daily wind data for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.

Clustering of environments based on weather data from the complete training dataset

clustering_env_data()

Clustering of environments solely based on environmental information

Step 4: Prediction of performance for untested genotypes and/or environment

Implement predictions for unobserved configurations of genotypic and environmental predictors

Main function

predict_trait_MET()

Phenotypic prediction of unobserved data.

Processing of genotypic data for ML-based predictions

apply_pca()

Data dimensionality reduction using PCA on a split object.

apply_pcs_G_Add()

Data dimensionality reduction by modeling genetic effects using the PCs of the genomic relationship matrix.

select_markers()

Selection of specific SNPs covariates.

marker_effect_per_env_EN()

Compute marker effects per environment with Elastic Net

marker_effect_per_env_FarmCPU()

Compute marker P-values for each environment with GWAS.

ML-methods implemented: processing functions according to the method

get_splits_processed_with_method()

Attribute a processing method for each list of training/test splits

new_stacking_reg_1() stacking_reg_1() validate_stacking_reg_1()

Processing of a split object to get data ready to be used and fitted with a stacking_reg_1 (stacking of SVM models) regression model.

new_stacking_reg_2() stacking_reg_2() validate_stacking_reg_2()

Processing of a split object to get data ready to be used and fitted with a stacking_reg_2 (stacking of SVM models) regression model.

new_stacking_reg_3() stacking_reg_3() validate_stacking_reg_3()

Processing of a split object to get data ready to be used and fitted with a stacking_reg_3 (stacking of SVM models) regression model.

new_xgb_reg_1() xgb_reg_1() validate_xgb_reg_1()

Processing of a split object to get data ready to be used and fitted with a xgb_reg_1 (gradient boosted tree) regression model.

new_xgb_reg_2() xgb_reg_2() validate_xgb_reg_2()

Processing of a split object to get data ready to be used and fitted with a xgb_reg_2 (gradient boosted tree) regression model.

new_xgb_reg_3() xgb_reg_3() validate_xgb_reg_3()

Processing of a split object to get data ready to be used and fitted with a xgb_reg_3 (gradient boosted tree) regression model.

new_DL_reg_1() DL_reg_1() validate_DL_reg_1()

Processing of a split object to get data ready to be used and fitted with a DL_reg_1 (neural network) regression model.

new_DL_reg_2() DL_reg_2() validate_DL_reg_2()

Processing of a split object to get data ready to be used and fitted with a DL_reg_2 (neural network) regression model.

new_DL_reg_3() DL_reg_3() validate_DL_reg_3()

Processing of a split object to get data ready to be used and fitted with a DL_reg_3 (neural network) regression model.

new_rf_reg_1() rf_reg_1() validate_rf_reg_1()

Processing of a split object to get data ready to be used and fitted with a rf_reg_1 (random forest) regression model.

new_rf_reg_2() rf_reg_2() validate_rf_reg_2()

Processing of a split object to get data ready to be used and fitted with a rf_reg_2 (random forest) regression model.

new_rf_reg_3() rf_reg_3() validate_rf_reg_3()

Processing of a split object to get data ready to be used and fitted with a rf_reg_3 (random forest) regression model.

ML-methods implemented: fitting functions according to the method

fit_split()

S3 method used to fit an object of class rf_reg_1, rf_reg_2, rf_reg_3, xgb_reg_1, xgb_reg_2, xgb_reg_3,DL_reg,DL_reg_1, DL_reg_2,DL_reg_3,stacking_reg_1, stacking_reg_2 or stacking_reg_3.

Compute variable importance (model-specific; model-agnostic methods, e.g. permutation-based methods)

variable_importance_split()

Compute variable importance according to the machine learning algorithm used

plot_results_vip()

Plot variable importance scores

Accumulated local effects plots: understand the influence of the value of a variable on the changes in model´s prediction

ALE_plot_split()

ALE plots feature-wise

Plot variable importance results

plot_results_vip()

Plot variable importance scores

Step 5: Analysis of prediction results for new observations, by location and by environmental cluster

analysis_predictions()

Analysis of prediction results from predict_trait_MET() by location.

Datasets

Toy data to illustrate the use of the package functions

pheno_indica

Multi-year trial data of rice

geno_indica

Multi-year trial data of rice

map_indica

Multi-year trial data of rice

climate_variables_indica

Multi-year trial data of rice

info_environments_indica

Multi-year trial data of rice

pheno_japonica

Multi-year trial data of rice

geno_japonica

Multi-year trial data of rice

map_japonica

Multi-year trial data of rice

climate_variables_japonica

Multi-year trial data of rice

info_environments_japonica

Multi-year trial data of rice

pheno_G2F

Maize experimental multi-environment data sets (Genomes to Fields Initiative)

geno_G2F

Maize experimental multi-environment data sets (Genomes to Fields Initiative)

map_G2F

Maize experimental multi-environment data sets (Genomes to Fields Initiative)

soil_G2F

Maize experimental multi-environment data sets (Genomes to Fields Initiative)

info_environments_G2F

Maize experimental multi-environment data sets (Genomes to Fields Initiative)

intervals_growth_manual_G2F

Maize experimental multi-environment data sets (Genomes to Fields Initiative)