rvtable
packagedist_data.Rd
Compile a specialized data frame based on the rvtable
package using distribution data frames of SNAP downscaled climate data.
dist_data(data, variable, margin = NULL, seed = NULL, metric = NULL, year_range, rcp_min_yr, base_max_yr, all_models, baseline_model = NULL, composite = "Composite GCM", baseline_scenario = "Historical", general_scenario = "Projected", margin_drop = c(baseline_scenario, baseline_model), density_size = 200, margin_size = 100, sample_size = margin_size, limit_sample = TRUE, baseline_only = FALSE, progress = TRUE)
data | a data frame. It does not need to be an |
---|---|
variable | character, a valid random variable. See details for currently available options. |
margin | variable to marginalize over. Defaults to |
seed | numeric or |
metric |
|
year_range | full range of years in data set. |
rcp_min_yr | minimum year for RCP, e.g., for CMIP5 data this is 2006. |
base_max_yr | maximum year for baseline historical comparison data set that sometimes accompanies GCM data (e.g., CRU observation-based data, version 4.0 is 2015) |
all_models | character, vector of climate model names in data set, to include baseline model if present. |
baseline_model | character, name of baseline model in data set, e.g., |
composite | character, name to use for composite climate models after marginalizing over models. |
baseline_scenario | character, defaults to |
general_scenario | character, defaults to |
margin_drop | levels of variables to exclude from marginalizing operations on those variables. Defaults to the baseline scenario and baseline model. |
density_size | numeric, sample size for density estimations. Defaults to |
margin_size | numeric, sample size for marginalizing operations. Defaults to |
sample_size | numeric, sample size for density estimations. Defaults to |
limit_sample | logical, see details. |
baseline_only | logical, only processing baseline data set. Useful for climatology data. |
progress | logical, include progress bar in app. |
a specialized data frame
This is a specialized function suited to preparing reactive data frames for an app where the upstream source data represents
an rvtable
-class probability density data frame from the rvtable
package.
Many such data frames of SNAP data are available.
This function assumes the presence of certain data frame columns: Val, Prob, Var, RCP, Model, and Year.
It will insert a Decade column. It will check to ensure a valid Var column, meaning a data frame can contain only
one unique variable in its Var ID column and it must currently be one of "pr", "tas", "tasmin", "tasmax"
.
This is because the current implementation makes certain assumptions about the data based on presently existing realistic use cases.
A powerful feature of this function, given an appropriate rvtable
data frame, is the ability to marginalize over
categorical variables (and meaningfully discrete numeric variables such as year) using the margin
argument.
The current implementation allows marginalizing over RCPs and/or climate models.
Arguments such as variable
and year.range
can be determined internally with data
directly,
but in the app context these variables are already determined in the session environment
and there is no need to repeat scans of large data frames columns with every call to dist_data
.
Note that during marginalizing operations, baseline historical data sets are not integrated with climate models when integrating models and historical climate models years are not integrated with future projections when integrating RCPs. All categorical variables are factors with explicit levels, not character.
If limit.sample=TRUE
(default), the final sample size is reduced by a factor proportional to the number of unique RCP-GCM pairs.
This helps prevent massive in-app samples when users select large amounts of data from many RCPs and models.
A minimum sample size per group is still maintained regardless of how much data is requested.
Detailed progress is provided for sampling from distributions and for calculating marginal distributions.
#not run