Compile a specialized data frame based on the rvtable package using distribution data frames of SNAP downscaled climate data.

dist_data(data, variable, margin = NULL, seed = NULL, metric = NULL,
  year_range, rcp_min_yr, base_max_yr, all_models, baseline_model = NULL,
  composite = "Composite GCM", baseline_scenario = "Historical",
  general_scenario = "Projected", margin_drop = c(baseline_scenario,
  baseline_model), density_size = 200, margin_size = 100,
  sample_size = margin_size, limit_sample = TRUE, baseline_only = FALSE,
  progress = TRUE)

Arguments

data

a data frame. It does not need to be an rvtable-class data frame in advance, but it must be coercible to one.

variable

character, a valid random variable. See details for currently available options.

margin

variable to marginalize over. Defaults to NULL.

seed

numeric or NULL (default), set random seed for reproducible sampling in app.

metric

NULL or logical. Output data in metric units, otherwise in US Standard. Input data in data is assumed metric. If NULL (default), no conversion or climate variable-specific rounding is performed.

year_range

full range of years in data set.

rcp_min_yr

minimum year for RCP, e.g., for CMIP5 data this is 2006.

base_max_yr

maximum year for baseline historical comparison data set that sometimes accompanies GCM data (e.g., CRU observation-based data, version 4.0 is 2015)

all_models

character, vector of climate model names in data set, to include baseline model if present.

baseline_model

character, name of baseline model in data set, e.g., "CRU 4.0".

composite

character, name to use for composite climate models after marginalizing over models.

baseline_scenario

character, defaults to "Historical".

general_scenario

character, defaults to "Projected".

margin_drop

levels of variables to exclude from marginalizing operations on those variables. Defaults to the baseline scenario and baseline model.

density_size

numeric, sample size for density estimations. Defaults to 200.

margin_size

numeric, sample size for marginalizing operations. Defaults to 100.

sample_size

numeric, sample size for density estimations. Defaults to margin.size.

limit_sample

logical, see details.

baseline_only

logical, only processing baseline data set. Useful for climatology data.

progress

logical, include progress bar in app.

Value

a specialized data frame

Details

This is a specialized function suited to preparing reactive data frames for an app where the upstream source data represents an rvtable-class probability density data frame from the rvtable package. Many such data frames of SNAP data are available.

This function assumes the presence of certain data frame columns: Val, Prob, Var, RCP, Model, and Year. It will insert a Decade column. It will check to ensure a valid Var column, meaning a data frame can contain only one unique variable in its Var ID column and it must currently be one of "pr", "tas", "tasmin", "tasmax". This is because the current implementation makes certain assumptions about the data based on presently existing realistic use cases.

A powerful feature of this function, given an appropriate rvtable data frame, is the ability to marginalize over categorical variables (and meaningfully discrete numeric variables such as year) using the margin argument. The current implementation allows marginalizing over RCPs and/or climate models.

Arguments such as variable and year.range can be determined internally with data directly, but in the app context these variables are already determined in the session environment and there is no need to repeat scans of large data frames columns with every call to dist_data.

Note that during marginalizing operations, baseline historical data sets are not integrated with climate models when integrating models and historical climate models years are not integrated with future projections when integrating RCPs. All categorical variables are factors with explicit levels, not character.

If limit.sample=TRUE (default), the final sample size is reduced by a factor proportional to the number of unique RCP-GCM pairs. This helps prevent massive in-app samples when users select large amounts of data from many RCPs and models. A minimum sample size per group is still maintained regardless of how much data is requested. Detailed progress is provided for sampling from distributions and for calculating marginal distributions.

Examples

#not run