Step 1: create METData objectSpecify input data and processing parameters (automatic retrieval of external weather data; QC on raw weather data, if provided…) |
|
---|---|
Main function |
|
|
Create a multi-environment trials data object |
Get daily weather data for an environment based on geographical coordinates |
|
Obtain daily climate data for an environment from NASA POWER data. |
|
Compute the day length given the altitude and day of year. |
|
Formulas to compute saturated vapor pressure deficit |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Obtain soil data for a given environment |
|
Calculates reference ET0 based on the Penman-Monteith model (FAO-56 Method) |
|
Check daily weather data (non-exhaustive quality control) provided by user |
|
Quality control on daily weather data |
|
Compute environmental covariates based on raw daily weather data |
|
Compute environmental covariates for each environment of the MET dataset. |
|
Compute ECs based on day-windows of fixed length. |
|
Compute ECs based on a fixed number of day-windows (fixed number across all environments). |
|
Compute ECs based on day-windows of fixed length. |
|
Compute ECs based on growth stages which are estimated based on accumulated GDD in each environment. |
|
Internal function of [compute_EC_gdd())] [compute_EC_gdd())]: R:compute_EC_gdd()) |
|
Obtain daily solar radiation for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations. |
|
Obtain daily wind data for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations. |
|
Obtain elevation data for each field trial based on longitude and latitude data |
|
Clustering of environments based on weather data from the complete training dataset |
|
Clustering of environments solely based on environmental information |
|
Overview of the METData object created |
|
Summary of an object of class METData |
|
Print the summary of an object of class METData |
|
Step 2: cross-validated model evaluation of the METDataEvaluate predictive ability of a machine learning-based model with a specific CV scheme |
|
Main function |
|
Cross-validation procedure for phenotypic prediction of crop varieties. |
|
Create train/test splits to address typical prediction problems for MET datasets |
|
Get train/test splits of the phenotypic MET dataset based on CV0. |
|
Get train/test splits of the phenotypic MET dataset based on CV0. |
|
Get train/test splits of the phenotypic MET dataset based on CV1. |
|
Get train/test splits of the phenotypic MET dataset based on CV2. |
|
Processing of genotypic data for ML-based predictions |
|
Data dimensionality reduction using PCA on a split object. |
|
Data dimensionality reduction by modeling genetic effects using the PCs of the genomic relationship matrix. |
|
Selection of specific SNPs covariates. |
|
Compute marker effects per environment with Elastic Net |
|
Compute marker P-values for each environment with GWAS. |
|
ML-methods implemented: processing functions according to the method |
|
Attribute a processing method for each list of training/test splits |
|
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
ML-methods implemented: fitting functions according to the method |
|
S3 method used to fit an object of class |
|
Compute variable importance (model-specific and model-free, e.g. permutation-based methods) |
|
Compute variable importance according to the machine learning algorithm used |
|
Plot cross-validated results for predictive ability |
|
Plot cross-validated results for the ML model and the trait under study. |
|
Plot variable importance results |
|
Plot variable importance scores |
|
Step 3: Create a table of new phenotypes to predict (i.e. for a set of given genotypes in a given environment) |
|
Main function |
|
|
Create a multi-environment trials data object |
Get daily weather data for an environment based on geographical coordinates |
|
Obtain daily climate data for an environment from NASA POWER data. |
|
Compute the day length given the altitude and day of year. |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Formulas to compute vapour pressure deficit according to available data |
|
Check daily weather data (non-exhaustive quality control) provided by user |
|
Quality control on daily weather data |
|
Compute environmental covariates based on raw daily weather data |
|
Compute environmental covariates for each environment of the MET dataset. |
|
Compute ECs based on day-windows of fixed length. |
|
Compute ECs based on a fixed number of day-windows (fixed number across all environments). |
|
Compute ECs based on growth stages which are estimated based on accumulated GDD in each environment. |
|
Internal function of [compute_EC_gdd())] [compute_EC_gdd())]: R:compute_EC_gdd()) |
|
Obtain daily solar radiation for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations. |
|
Obtain daily wind data for an IDenv with package nasapower derived by NASA from satellite & atmospheric observations. |
|
Clustering of environments based on weather data from the complete training dataset |
|
Clustering of environments solely based on environmental information |
|
Step 4: Prediction of performance for untested genotypes and/or environmentImplement predictions for unobserved configurations of genotypic and environmental predictors |
|
Main function |
|
Phenotypic prediction of unobserved data. |
|
Processing of genotypic data for ML-based predictions |
|
Data dimensionality reduction using PCA on a split object. |
|
Data dimensionality reduction by modeling genetic effects using the PCs of the genomic relationship matrix. |
|
Selection of specific SNPs covariates. |
|
Compute marker effects per environment with Elastic Net |
|
Compute marker P-values for each environment with GWAS. |
|
ML-methods implemented: processing functions according to the method |
|
Attribute a processing method for each list of training/test splits |
|
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
Processing of a split object to get data ready to be used and fitted with
a |
|
ML-methods implemented: fitting functions according to the method |
|
S3 method used to fit an object of class |
|
Compute variable importance (model-specific; model-agnostic methods, e.g. permutation-based methods) |
|
Compute variable importance according to the machine learning algorithm used |
|
Plot variable importance scores |
|
Accumulated local effects plots: understand the influence of the value of a variable on the changes in model´s prediction |
|
ALE plots feature-wise |
|
Plot variable importance results |
|
Plot variable importance scores |
|
Step 5: Analysis of prediction results for new observations, by location and by environmental cluster |
|
Analysis of prediction results from |
|
DatasetsToy data to illustrate the use of the package functions |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Multi-year trial data of rice |
|
Maize experimental multi-environment data sets (Genomes to Fields Initiative) |
|
Maize experimental multi-environment data sets (Genomes to Fields Initiative) |
|
Maize experimental multi-environment data sets (Genomes to Fields Initiative) |
|
Maize experimental multi-environment data sets (Genomes to Fields Initiative) |
|
Maize experimental multi-environment data sets (Genomes to Fields Initiative) |
|
Maize experimental multi-environment data sets (Genomes to Fields Initiative) |