The function processes a split object (training + test sets), according to the configuration set by the user. For instance, genomic information is incorporated according to the option set by the user. A list of specific environmental covariables to use can be provided.

Multiple recipes are created using the package recipes according to the data source (genomic, environmental data). These recipes specify additional preprocessing steps, such as standardization based on the training set, with same transformations used on the test set. Variables with null variance are removed. If year effect is included, it is converted to dummy variables.
Three recipes are created: one with only SNPs data, one with only environmental data and one with PCs extracted from SNP data and ECs combined. rec_G, rec_E and rec_GE will be fitted with a support vector regression model, according to a type of kernel (linear, polynomial, rbf) which can be chosen by the user. Predictions of these models will be combined in a stacked model (see function fit_cv_split.stacking_reg_3()).

new_stacking_reg_3(
  split = NULL,
  trait = NULL,
  geno = NULL,
  env_predictors = NULL,
  info_environments = NULL,
  use_selected_markers = F,
  SNPs = NULL,
  include_env_predictors = T,
  list_env_predictors = NULL,
  lat_lon_included = F,
  year_included = F,
  ...
)

stacking_reg_3(
  split,
  trait,
  geno,
  env_predictors,
  info_environments,
  use_selected_markers,
  SNPs,
  list_env_predictors,
  include_env_predictors,
  lat_lon_included,
  year_included,
  ...
)

validate_stacking_reg_3(x, ...)

Arguments

split

an object of class split. A split object contains a training and test elements.

trait

character Name of the trait to predict. An ordinal trait should be encoded as integer.

geno

data.frame It corresponds to a geno element within an object of class METData.

env_predictors

data.frame It corresponds to the env_data element within an object of class METData.

info_environments

data.frame It corresponds to the info_environments element within an object of class METData.

use_selected_markers

A Logical indicating whether to use a subset of markers identified via single-environment GWAS or based on the table of marker effects obtained via Elastic Net as predictor variables, when main genetic effects are modeled with principal components.
If use_selected_markers is TRUE, the SNPs argument should be provided. For more details, see select_markers()

SNPs

A data.frame with the genotype matrix (individuals in rows and selected markers in columns) for SNPs selected via the select_markers() function. Optional argument, can remain as NULL if no single markers should be incorporated as predictor variables in analyses based on PCA decomposition.

include_env_predictors

A logical indicating whether environmental covariates characterizing each environment should be used in predictions.

list_env_predictors

A character vector containing the names of the environmental predictors which should be used in predictions. By default NULL: all environmental predictors included in the env_data table of the METData object will be used.

lat_lon_included

logical indicates if longitude and latitude data should be used as numeric predictors. Default is FALSE.

year_included

logical indicates if year factor should be used as predictor variable. Default is FALSE.

Value

A list object of class stacking_reg_3 with the following items:

training

data.frame Training set after partial processing

test

data.frame Test set after partial processing

rec_G

A recipe object, specifying the remaining processing steps which are implemented when a model is fitted on the training set with a recipe. Data used are predictors corresponding to genomic data.

rec_E

A recipe object, specifying the remaining processing steps which are implemented when a model is fitted on the training set with a recipe. Data used are predictors corresponding to enviornmental predictors.

References

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, Fran攼㸷ois R, Grolemund G, Hayes A, Henry L, Hester J, others (2019). “Welcome to the Tidyverse.” Journal of open source software, 4(43), 1686. Kuhn M, Wickham H (2020). Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles.. https://www.tidymodels.org.