This function combines all types of data sources (genotypic, phenotypic, information about the environments, environmental data if available...) in a single data object of class METData.

new_create_METData(
  geno = NULL,
  map = NULL,
  pheno = NULL,
  info_environments = NULL,
  raw_weather_data = NULL,
  climate_variables = NULL,
  soil_variables = NULL,
  compute_climatic_ECs = FALSE,
  path_to_save = NULL,
  as_test_set = FALSE,
  get_public_soil_data = FALSE,
  ...
)

create_METData(
  geno = NULL,
  pheno = NULL,
  info_environments = NULL,
  map = NULL,
  climate_variables = NULL,
  compute_climatic_ECs = FALSE,
  soil_variables = NULL,
  raw_weather_data = NULL,
  path_to_save = NULL,
  ...
)

validate_create_METData(x, ...)

Arguments

geno

numeric genotype values stored in a matrix or data.frame which contains the geno_ID as row.names and markers as columns.

map

data.frame object with 3 columns.

  1. marker character with marker names

  2. chr numeric with chromosome number

  3. pos numeric with marker position.

Map object not mandatory.

pheno

data.frame object with at least 4 columns.

  1. geno_ID character contains the genotype identifiers.

  2. year numeric contains the year of the observation.

  3. location character contains the name of the location.

From the fourth column on: each column is numeric and contains phenotypic values for a phenotypic trait observed in a combination Year x Location. Names of the traits should be provided as column names.

  • The geno_ID must be a subset of the row.names in the geno object.

info_environments

data.frame object with at least the 4 following columns.

  1. year: numeric Year label of the environment

  2. location: character Name of the location

  3. longitude: numeric longitude of the environment

  4. latitude: numeric latitude of the environment

The two next columns are required only if weather data should be retrieved from NASA POWER data using the argument compute_climatic_EC set to TRUE, or if raw weather data are provided:

  1. planting.date: (optional) Date YYYY-MM-DD

  2. harvest.date: (optional) Date YYYY-MM-DD

  3. elevation: (optional) numeric

  • The data.frame should contain as many rows as Year x Location combinations which will be used in pheno_new.

raw_weather_data

data.frame can be let as NULL by user, if no daily weather datasets are available. If else, required columns should be provided like this (colnames should be respected):

  1. longitude numeric

  2. latitude numeric

  3. year numeric

  4. location character

  5. YYYYMMDD Date

Available weather data provided by user must be a subset of the following weather variable names. Colnames must be given as following:

  1. T2M numeric Daily mean temperature (°C)

  2. T2M_MIN numeric Daily minimum temperature (°C)

  3. T2M_MAX numeric Daily maximum temperature (°C)

  4. PRECTOTCORR numeric Daily total precipitation (mm)

  5. RH2M numeric Daily mean relative humidity (%)

  6. RH2M_MIN numeric Daily minimum relative humidity (%)

  7. RH2M_MAX numeric Daily maximum relative humidity (%)

  8. daily_solar_radiation numeric daily solar radiation (MJ/m^2/day)

  9. top_atmosphere_insolation numeric Top-of-atmosphere Insolation (MJ/m^2/day)

  10. T2MDEW numeric Dew Point (°C)

It is not required that weather data for ALL environments are provided by the user. If weather data for some environments are missing, they will be retrieved by the NASA

climate_variables

data.frame can be let as NULL by user, if no climate variables provided as input. Otherwise, a data.frame should be provided. The data.frame should contain as many rows as the info_environments data.frame.
Columns should be:

  1. year numeric with the year label

  2. location character with the location character

Columns 3 and + should be numeric and contain the climate (weather-based) covariates.

  • If climate_variables is provided,compute_climatic_ECsshould be set to FALSE.

soil_variables

data.frame can be let as NULL by user, if no soil variables provided as input. Otherwise, a data.frame should be provided. The data.frame should contain as many rows as the info_environments data.frame.
Columns should be:

  1. year numeric with the year label

  2. location character with the location character

Columns 3 and + should be numeric and contain the soil-based environmental covariates.

compute_climatic_ECs

logical indicates if climatic covariates should be computed with the function. Default is FALSE.
Set compute_climatic_ECs = TRUE if user wants to use weather data from NASA POWER data OR if raw weather data are available and should be used (also possible to provide field weather data for only some environments; weather data for other environments present in the dataset will be retrieved using the NASA POWER query.

path_to_save

Path where daily weather data (if retrieved) and plots based on k-means clustering are saved.

as_test_set

If using a prediction set (i.e. no phenotypic values for the new data to predict), should be set to TRUE. Default is FALSE.

get_public_soil_data

logical Indicates whether public soil data should be downloaded.

Value

A formatted list of class METData which contains the following elements:

  • geno: matrix with genotype values of phenotyped individuals.

  • map: data.frame with genetic map.

  • pheno: data.frame with phenotypic trait values.

  • compute_EC_by_geno: logical indicates if environmental covariates were required to be retrieved via the package by the user.

  • env_data: data.frame with the environmental covariates per environment

  • list_climatic_predictors: character with the names of the climatic predictor variables

  • list_soil_predictors: character with the names of the soil-based predictor variables

  • info_environments: data.frame contains basic information on each environment.

  • ECs_computed: logical subelement added in the output to indicate if the function get_ECs() was run within the pipeline.

  • climate_data_retrieved: logical subelement added in the output to indicate if NASAPOWER data were retrieved within the pipeline.

Author

Cathy C. Westhues cathy.jubin@uni-goettingen.de

Examples

data(geno_G2F) data(pheno_G2F) data(map_G2F) data(info_environments_G2F) data(soil_G2F) # Create METData and get climate variables from NASAPOWER data & use soil variables METdata_G2F <- create_METData(geno=geno_G2F,pheno=pheno_G2F,map=map_G2F,climate_variables = NULL,compute_climatic_ECs = TRUE,info_environments = info_environments_G2F,soil_variables=soil_G2F, path_to_save = "~/g2f_data")
#> No climate covariates provided by the user.
#> Warning: Coercing info_environments$planting.date to class 'POSIXct'.
#> Warning: Coercing info_environments$harvest.date to class 'POSIXct'.
#> Step 1: Processing/Retrieval of daily weather data starts! #> Daily weather tables have been downloaded from NASA POWER for the required environments in a previous run, and are matching the environments ID/planting and harvest dates used in this analysis. #> These data will be used. #> Daily weather tables downloaded from NASA POWER for the required environments! #> Step 1 is done! #> Step 2: Aggregation of daily weather data into covariavate starts! #> Step 2 is done! #> Computation of environmental covariates is done. #> Clustering of env. data starts.
#> Clustering of env. data done. #> Soil and climate data will be included in the final METData object.
data(geno_indica) data(map_indica) data(pheno_indica) data(info_environments_indica) data(climate_variables_indica) METdata_indica <- create_METData(geno=geno_indica,pheno=pheno_indica,climate_variables = climate_variables_indica,compute_climatic_ECs = FALSE,info_environments = info_environments_indica,map = map_indica, path_to_save = "~/indica")
#> No soil covariates provided by the user. #> Clustering of env. data starts.
#> Clustering of env. data done.
data(geno_japonica) data(map_japonica) data(pheno_japonica) data(info_environments_japonica) data(climate_variables_japonica) METdata_japonica <- create_METData(geno=geno_japonica,pheno=pheno_japonica,climate_variables = climate_variables_japonica,compute_climatic_ECs = FALSE,info_environments = info_environments_japonica,map = map_japonica, path_to_save = "~/japonica")
#> No soil covariates provided by the user. #> Clustering of env. data starts.
#> Clustering of env. data done.