Function reference • learnMET

Step 1: create METData object Specify input data and processing parameters (automatic retrieval of external weather data; QC on raw weather data, if provided…)
Main function
`new_create_METData()` `create_METData()` `validate_create_METData()`	Create a multi-environment trials data object
Get daily weather data for an environment based on geographical coordinates
`get_daily_tables_per_env()`	Obtain daily climate data for an environment from NASA POWER data.
`daylength()`	Compute the day length given the altitude and day of year.
`sat_vap_pressure()`	Formulas to compute saturated vapor pressure deficit
`get.ea()`	Formulas to compute vapour pressure deficit according to available data
`get.ea.with.rhmean()`	Formulas to compute vapour pressure deficit according to available data
`get.ea.no.RH()`	Formulas to compute vapour pressure deficit according to available data
`get.es()`	Formulas to compute vapour pressure deficit according to available data
`get.esmn()`	Formulas to compute vapour pressure deficit according to available data
`get.esmx()`	Formulas to compute vapour pressure deficit according to available data
`get_soil_per_env()`	Obtain soil data for a given environment
`penman_monteith_reference_et0()`	Calculates reference ET0 based on the Penman-Monteith model (FAO-56 Method)
Check daily weather data (non-exhaustive quality control) provided by user
`qc_raw_weather_data()`	Quality control on daily weather data
Compute environmental covariates based on raw daily weather data
`get_ECs()`	Compute environmental covariates for each environment of the MET dataset.
`compute_EC_fixed_length_window()`	Compute ECs based on day-windows of fixed length.
`compute_EC_fixed_number_windows()`	Compute ECs based on a fixed number of day-windows (fixed number across all environments).
`compute_EC_user_defined_intervals()`	Compute ECs based on day-windows of fixed length.
`compute_EC_gdd()`	Compute ECs based on growth stages which are estimated based on accumulated GDD in each environment.
`gdd_information()`	Internal function of [compute_EC_gdd())] [compute_EC_gdd())]: R:compute_EC_gdd())
`get_solar_radiation()`	Obtain daily solar radiation for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.
`get_wind_data()`	Obtain daily wind data for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.
`get_elevation()`	Obtain elevation data for each field trial based on longitude and latitude data
Clustering of environments based on weather data from the complete training dataset
`clustering_env_data()`	Clustering of environments solely based on environmental information
Overview of the METData object created
`summary(<METData>)`	Summary of an object of class METData
`print(<summary.METData>)`	Print the summary of an object of class METData
Step 2: cross-validated model evaluation of the METData Evaluate predictive ability of a machine learning-based model with a specific CV scheme
Main function
`predict_trait_MET_cv()`	Cross-validation procedure for phenotypic prediction of crop varieties.
Create train/test splits to address typical prediction problems for MET datasets
`predict_cv0()`	Get train/test splits of the phenotypic MET dataset based on CV0.
`predict_cv00()`	Get train/test splits of the phenotypic MET dataset based on CV0.
`predict_cv1()`	Get train/test splits of the phenotypic MET dataset based on CV1.
`predict_cv2()`	Get train/test splits of the phenotypic MET dataset based on CV2.
Processing of genotypic data for ML-based predictions
`apply_pca()`	Data dimensionality reduction using PCA on a split object.
`apply_pcs_G_Add()`	Data dimensionality reduction by modeling genetic effects using the PCs of the genomic relationship matrix.
`select_markers()`	Selection of specific SNPs covariates.
`marker_effect_per_env_EN()`	Compute marker effects per environment with Elastic Net
`marker_effect_per_env_FarmCPU()`	Compute marker P-values for each environment with GWAS.
ML-methods implemented: processing functions according to the method
`get_splits_processed_with_method()`	Attribute a processing method for each list of training/test splits
`new_stacking_reg_1()` `stacking_reg_1()` `validate_stacking_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `stacking_reg_1` (stacking of SVM models) regression model.
`new_stacking_reg_2()` `stacking_reg_2()` `validate_stacking_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `stacking_reg_2` (stacking of SVM models) regression model.
`new_stacking_reg_3()` `stacking_reg_3()` `validate_stacking_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `stacking_reg_3` (stacking of SVM models) regression model.
`new_xgb_reg_1()` `xgb_reg_1()` `validate_xgb_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `xgb_reg_1` (gradient boosted tree) regression model.
`new_xgb_reg_2()` `xgb_reg_2()` `validate_xgb_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `xgb_reg_2` (gradient boosted tree) regression model.
`new_xgb_reg_3()` `xgb_reg_3()` `validate_xgb_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `xgb_reg_3` (gradient boosted tree) regression model.
`new_DL_reg_1()` `DL_reg_1()` `validate_DL_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `DL_reg_1` (neural network) regression model.
`new_DL_reg_2()` `DL_reg_2()` `validate_DL_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `DL_reg_2` (neural network) regression model.
`new_DL_reg_3()` `DL_reg_3()` `validate_DL_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `DL_reg_3` (neural network) regression model.
`new_rf_reg_1()` `rf_reg_1()` `validate_rf_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `rf_reg_1` (random forest) regression model.
`new_rf_reg_2()` `rf_reg_2()` `validate_rf_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `rf_reg_2` (random forest) regression model.
`new_rf_reg_3()` `rf_reg_3()` `validate_rf_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `rf_reg_3` (random forest) regression model.
ML-methods implemented: fitting functions according to the method
`fit_cv_split()`	S3 method used to fit an object of class `rf_reg_1`, `rf_reg_2`, `rf_reg_3`, `xgb_reg_1`, `xgb_reg_2`, `xgb_reg_3`,`DL_reg`,`DL_reg_1`, `DL_reg_2`,`DL_reg_3`,`stacking_reg_1`, `stacking_reg_2` or `stacking_reg_3`.
Compute variable importance (model-specific and model-free, e.g. permutation-based methods)
`variable_importance_split()`	Compute variable importance according to the machine learning algorithm used
Plot cross-validated results for predictive ability
`plot_results_cv()`	Plot cross-validated results for the ML model and the trait under study.
Plot variable importance results
`plot_results_vip_cv()`	Plot variable importance scores
Step 3: Create a table of new phenotypes to predict (i.e. for a set of given genotypes in a given environment)
Main function
`new_create_METData()` `create_METData()` `validate_create_METData()`	Create a multi-environment trials data object
Get daily weather data for an environment based on geographical coordinates
`get_daily_tables_per_env()`	Obtain daily climate data for an environment from NASA POWER data.
`daylength()`	Compute the day length given the altitude and day of year.
`get.ea()`	Formulas to compute vapour pressure deficit according to available data
`get.ea.with.rhmean()`	Formulas to compute vapour pressure deficit according to available data
`get.ea.no.RH()`	Formulas to compute vapour pressure deficit according to available data
`get.ea.with.rhmax()`	Formulas to compute vapour pressure deficit according to available data
`get.es()`	Formulas to compute vapour pressure deficit according to available data
`get.esmn()`	Formulas to compute vapour pressure deficit according to available data
`get.esmx()`	Formulas to compute vapour pressure deficit according to available data
Check daily weather data (non-exhaustive quality control) provided by user
`qc_raw_weather_data()`	Quality control on daily weather data
Compute environmental covariates based on raw daily weather data
`get_ECs()`	Compute environmental covariates for each environment of the MET dataset.
`compute_EC_fixed_length_window()`	Compute ECs based on day-windows of fixed length.
`compute_EC_fixed_number_windows()`	Compute ECs based on a fixed number of day-windows (fixed number across all environments).
`compute_EC_gdd()`	Compute ECs based on growth stages which are estimated based on accumulated GDD in each environment.
`gdd_information()`	Internal function of [compute_EC_gdd())] [compute_EC_gdd())]: R:compute_EC_gdd())
`get_solar_radiation()`	Obtain daily solar radiation for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.
`get_wind_data()`	Obtain daily wind data for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations.
Clustering of environments based on weather data from the complete training dataset
`clustering_env_data()`	Clustering of environments solely based on environmental information
Step 4: Prediction of performance for untested genotypes and/or environment Implement predictions for unobserved configurations of genotypic and environmental predictors
Main function
`predict_trait_MET()`	Phenotypic prediction of unobserved data.
Processing of genotypic data for ML-based predictions
`apply_pca()`	Data dimensionality reduction using PCA on a split object.
`apply_pcs_G_Add()`	Data dimensionality reduction by modeling genetic effects using the PCs of the genomic relationship matrix.
`select_markers()`	Selection of specific SNPs covariates.
`marker_effect_per_env_EN()`	Compute marker effects per environment with Elastic Net
`marker_effect_per_env_FarmCPU()`	Compute marker P-values for each environment with GWAS.
ML-methods implemented: processing functions according to the method
`get_splits_processed_with_method()`	Attribute a processing method for each list of training/test splits
`new_stacking_reg_1()` `stacking_reg_1()` `validate_stacking_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `stacking_reg_1` (stacking of SVM models) regression model.
`new_stacking_reg_2()` `stacking_reg_2()` `validate_stacking_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `stacking_reg_2` (stacking of SVM models) regression model.
`new_stacking_reg_3()` `stacking_reg_3()` `validate_stacking_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `stacking_reg_3` (stacking of SVM models) regression model.
`new_xgb_reg_1()` `xgb_reg_1()` `validate_xgb_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `xgb_reg_1` (gradient boosted tree) regression model.
`new_xgb_reg_2()` `xgb_reg_2()` `validate_xgb_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `xgb_reg_2` (gradient boosted tree) regression model.
`new_xgb_reg_3()` `xgb_reg_3()` `validate_xgb_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `xgb_reg_3` (gradient boosted tree) regression model.
`new_DL_reg_1()` `DL_reg_1()` `validate_DL_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `DL_reg_1` (neural network) regression model.
`new_DL_reg_2()` `DL_reg_2()` `validate_DL_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `DL_reg_2` (neural network) regression model.
`new_DL_reg_3()` `DL_reg_3()` `validate_DL_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `DL_reg_3` (neural network) regression model.
`new_rf_reg_1()` `rf_reg_1()` `validate_rf_reg_1()`	Processing of a split object to get data ready to be used and fitted with a `rf_reg_1` (random forest) regression model.
`new_rf_reg_2()` `rf_reg_2()` `validate_rf_reg_2()`	Processing of a split object to get data ready to be used and fitted with a `rf_reg_2` (random forest) regression model.
`new_rf_reg_3()` `rf_reg_3()` `validate_rf_reg_3()`	Processing of a split object to get data ready to be used and fitted with a `rf_reg_3` (random forest) regression model.
ML-methods implemented: fitting functions according to the method
`fit_split()`	S3 method used to fit an object of class `rf_reg_1`, `rf_reg_2`, `rf_reg_3`, `xgb_reg_1`, `xgb_reg_2`, `xgb_reg_3`,`DL_reg`,`DL_reg_1`, `DL_reg_2`,`DL_reg_3`,`stacking_reg_1`, `stacking_reg_2` or `stacking_reg_3`.
Compute variable importance (model-specific; model-agnostic methods, e.g. permutation-based methods)
`variable_importance_split()`	Compute variable importance according to the machine learning algorithm used
`plot_results_vip()`	Plot variable importance scores
Accumulated local effects plots: understand the influence of the value of a variable on the changes in model´s prediction
`ALE_plot_split()`	ALE plots feature-wise
Plot variable importance results
`plot_results_vip()`	Plot variable importance scores
Step 5: Analysis of prediction results for new observations, by location and by environmental cluster
`analysis_predictions()`	Analysis of prediction results from `predict_trait_MET()` by location.
Datasets Toy data to illustrate the use of the package functions
`pheno_indica`	Multi-year trial data of rice
`geno_indica`	Multi-year trial data of rice
`map_indica`	Multi-year trial data of rice
`climate_variables_indica`	Multi-year trial data of rice
`info_environments_indica`	Multi-year trial data of rice
`pheno_japonica`	Multi-year trial data of rice
`geno_japonica`	Multi-year trial data of rice
`map_japonica`	Multi-year trial data of rice
`climate_variables_japonica`	Multi-year trial data of rice
`info_environments_japonica`	Multi-year trial data of rice
`pheno_G2F`	Maize experimental multi-environment data sets (Genomes to Fields Initiative)
`geno_G2F`	Maize experimental multi-environment data sets (Genomes to Fields Initiative)
`map_G2F`	Maize experimental multi-environment data sets (Genomes to Fields Initiative)
`soil_G2F`	Maize experimental multi-environment data sets (Genomes to Fields Initiative)
`info_environments_G2F`	Maize experimental multi-environment data sets (Genomes to Fields Initiative)
`intervals_growth_manual_G2F`	Maize experimental multi-environment data sets (Genomes to Fields Initiative)

Reference

Step 1: create METData object

Main function

Get daily weather data for an environment based on geographical coordinates

Check daily weather data (non-exhaustive quality control) provided by user

Compute environmental covariates based on raw daily weather data

Clustering of environments based on weather data from the complete training dataset

Overview of the METData object created

Step 2: cross-validated model evaluation of the METData

Main function

Create train/test splits to address typical prediction problems for MET datasets

Processing of genotypic data for ML-based predictions

ML-methods implemented: processing functions according to the method

ML-methods implemented: fitting functions according to the method

Compute variable importance (model-specific and model-free, e.g. permutation-based methods)

Plot cross-validated results for predictive ability

Plot variable importance results

Step 3: Create a table of new phenotypes to predict (i.e. for a set of given genotypes in a given environment)

Main function

Get daily weather data for an environment based on geographical coordinates

Check daily weather data (non-exhaustive quality control) provided by user

Compute environmental covariates based on raw daily weather data

Clustering of environments based on weather data from the complete training dataset

Step 4: Prediction of performance for untested genotypes and/or environment

Main function

Processing of genotypic data for ML-based predictions

ML-methods implemented: processing functions according to the method

ML-methods implemented: fitting functions according to the method

Compute variable importance (model-specific; model-agnostic methods, e.g. permutation-based methods)

Accumulated local effects plots: understand the influence of the value of a variable on the changes in model´s prediction

Plot variable importance results

Step 5: Analysis of prediction results for new observations, by location and by environmental cluster

Datasets