The function processes a split object (training + test sets), according to the configuration set by the user. For instance, genomic information is incorporated according to the option set by the user. A list of specific environmental covariables to use can be provided.

A recipe is created using the package recipes, to specify additional preprocessing steps, such as standardization based on the training set, with same transformations used on the test set. Variables with null variance are removed. If year effect is included, it is converted to dummy variables.
Further fitting on the training set with a gradient boosting model (see function fit_cv_split.rf_reg_3())).

This prediction method can be very slow according to the number of SNPs variables used!

new_rf_reg_3(
  split = NULL,
  trait = NULL,
  geno = NULL,
  env_predictors = NULL,
  info_environments = NULL,
  use_selected_markers = F,
  SNPs = NULL,
  include_env_predictors = T,
  list_env_predictors = NULL,
  lat_lon_included = F,
  year_included = F,
  ...
)

rf_reg_3(
  split,
  trait,
  geno,
  env_predictors,
  info_environments,
  use_selected_markers,
  SNPs,
  list_env_predictors,
  include_env_predictors,
  lat_lon_included,
  year_included,
  ...
)

validate_rf_reg_3(x, ...)

Arguments

split

an object of class split. A split object contains a training and test elements.

trait

character Name of the trait to predict. An ordinal trait should be encoded as integer.

geno

data.frame It corresponds to a geno element within an object of class METData.

env_predictors

data.frame It corresponds to the env_data element within an object of class METData.

info_environments

data.frame It corresponds to the info_environments element within an object of class METData.

use_selected_markers

A Logical indicating whether to use a subset of markers identified via single-environment GWAS or based on the table of marker effects obtained via Elastic Net as predictor variables, when main genetic effects are modeled with principal components.
If use_selected_markers is TRUE, the SNPs argument should be provided. For more details, see select_markers()

SNPs

A data.frame with the genotype matrix (individuals in rows and selected markers in columns) for SNPs selected via the select_markers() function. Optional argument, can remain as NULL if no single markers should be incorporated as predictor variables in analyses based on PCA decomposition.

include_env_predictors

A logical indicating whether environmental covariates characterizing each environment should be used in predictions.

list_env_predictors

A character vector containing the names of the environmental predictors which should be used in predictions. By default NULL: all environmental predictors included in the env_data table of the METData object will be used.

lat_lon_included

logical indicates if longitude and latitude data should be used as numeric predictors. Default is FALSE.

year_included

logical indicates if year factor should be used as predictor variable. Default is FALSE.

Value

A list object of class rf_reg_3 with the following items:

training

data.frame Training set after partial processing

test

data.frame Test set after partial processing

rec

A recipe object, specifying the remaining processing steps which are implemented when a model is fitted on the training set with a recipe.

References

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, Fran攼㸷ois R, Grolemund G, Hayes A, Henry L, Hester J, others (2019). “Welcome to the Tidyverse.” Journal of open source software, 4(43), 1686. Kuhn M, Wickham H (2020). Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles.. https://www.tidymodels.org.