Class constructor for rvtable objects.

rvtable(x, y = NULL, Val, Prob, discrete = FALSE, density.args = list(),
  weights = list(), force.dist = TRUE)

Arguments

x

a numeric vector or data frame.

y

an optional vector of probabilities associated with x when x is a numeric vector with no similar probabilities vector attribute.

Val

the column name of x referring to random variable values when x is a data frame.

Prob

the column name of x referring to random variable values when x is a data frame.

discrete

whether the random variable is discrete.

density.args

optional arguments passed to density.

weights

NULL or a list of weights associated with levels of ID variables in `x`. See details.

force.dist

logical, force distribution-type rvtable output if Prob is missing, i.e., Val is assumed to be a sample. Defaults to TRUE.

Value

an object of class rvtable.

Details

These are long format data tables containing a Val and Prob column describing the distribution of a random variable. Any other columns are ID columns and should be categorical variables. The random variable described by Val and Prob may be discrete or continuous. When discrete, probabilities are true probabilities. When continuous, Val and Prob are based on x and y output from density and describe a distribution curve, and therefore values in Prob may be greater than one and may not sum to one. Val is typically numeric but may be character when discrete such as when an rvtable object is returned from inverse_pmf.

For data frame inputs, Val and Prob here refer generally to whatever columns in an rvtable are specified by Val and Prob. In rvtable, if the Val argument is not supplied, Val is assumed to be Val="Val" and rvtable will search the names of a data frame for this column, throwing an error if it is not found, like with any other value of Val. When Prob is missing, however, this is analogous to when x is numeric and y and x probability attributes are both NULL: the data in the Val column are assumed to be a direct sample from a distribution rather than a vector of values that describes a distribution in conjunction with a Prob column. When x is numeric, a supplied Val will substitute for rvtable names x and y in the output, respectively.

All rvtable objects are in one of two forms: distribution-type or sample-type. This primary constructor constructs distribution- or sample-type rvtable objects, with the corresponding attribute tabletype="distribution" or tabletype="sample". Sampling on an rvtable can generate a sample-type rvtable, with the attribute tabletype="sample". Other operations like merging or marginalizing distributions typically yield rvtables in distribution form. This is the common form and rvtables are usually kept in this form until a final step in a processing chain where samples are needed.

Every rvtable object also has a variable type attribute, rvtype, which is either "discrete" or "continuous". Other attributes assigned during rvtable construction include valcol and probcol, the names of the Val and Prob columns, and a density.args attribute that lists any most recent arguments passed to density in the process of making the rvtable.

If an rvtable is already of class rvtable, the rvtable function simply returns the rvtable as is; any other arguments passed to rvtable are ignored and neither the table nor its attributes are updated or altered in any way.

Weights for levels of an ID variable are can be set by set_weights. In rvtable, weights is passed to set_weights so that weights can be set on rvtable construction if desired. It can be relatively cumbersome to set weights for all ID variables at once in a call to rvtable, and particularly wasteful if there are several ID columns and most or all have levels with equal weights. The weights attribute assigned to the rvtable will be an empty list if there are no ID columns in x and a named list of data frames otherwise. The names are the names of the ID variables in x and each data frame has two columns: levels and weights, giving the weighting of an ID variable's levels. When weights=NULL or weights=list(), the values in the weights column for each level in each data frame are set to 1 for equal weighting. Similarly, individual named list elements can be set to NULL for convenience instead of passing a data frame. NULL or list() elements in the weights list are converted to data frames with weights equal to one. Note that weights is ignored if x is not a data frame.

Examples

# basic samples from continuous and discrete RVs x <- rnorm(100) rvtable(x)
#> # A tibble: 512 x 2 #> x y #> <dbl> <dbl> #> 1 -3.45 0.000213 #> 2 -3.44 0.000240 #> 3 -3.42 0.000269 #> 4 -3.41 0.000302 #> 5 -3.39 0.000339 #> 6 -3.38 0.000379 #> 7 -3.36 0.000424 #> 8 -3.34 0.000474 #> 9 -3.33 0.000528 #> 10 -3.31 0.000588 #> # ... with 502 more rows
rvtable(x, density.args=list(n=50, adjust=2))
#> # A tibble: 50 x 2 #> x y #> <dbl> <dbl> #> 1 -4.73 0.000191 #> 2 -4.52 0.000428 #> 3 -4.30 0.000906 #> 4 -4.08 0.00181 #> 5 -3.86 0.00342 #> 6 -3.64 0.00614 #> 7 -3.43 0.0104 #> 8 -3.21 0.0169 #> 9 -2.99 0.0261 #> 10 -2.77 0.0385 #> # ... with 40 more rows
rvtable(x, discrete=TRUE) # incorrect: x is a continuous RV
#> # A tibble: 100 x 2 #> x y #> <dbl> <dbl> #> 1 -2.18 0.0100 #> 2 -2.12 0.0100 #> 3 -1.87 0.0100 #> 4 -1.87 0.0100 #> 5 -1.85 0.0100 #> 6 -1.80 0.0100 #> 7 -1.78 0.0100 #> 8 -1.64 0.0100 #> 9 -1.47 0.0100 #> 10 -1.43 0.0100 #> # ... with 90 more rows
x <- sample(1:10, size=30, replace=TRUE, prob=sqrt(10:1)) rvtable(x, discrete=TRUE) # discrete=T only needed if ambiguous
#> # A tibble: 9 x 2 #> x y #> <dbl> <dbl> #> 1 1.00 0.167 #> 2 2.00 0.100 #> 3 3.00 0.133 #> 4 4.00 0.0333 #> 5 5.00 0.167 #> 6 6.00 0.133 #> 7 7.00 0.133 #> 8 8.00 0.0667 #> 9 9.00 0.0667
x <- 1:5 probs <- c(0.1, 0.2, 0.3, 0.15, 0.25) rvtable(x, y=probs) # discrete inferred from y
#> # A tibble: 5 x 2 #> x y #> <int> <dbl> #> 1 1 0.100 #> 2 2 0.200 #> 3 3 0.300 #> 4 4 0.150 #> 5 5 0.250
attr(x, "probabilities") <- probs rvtable(x) # inferred from attributes (partial match 'prob')
#> # A tibble: 5 x 2 #> x y #> <int> <dbl> #> 1 1 0.100 #> 2 2 0.200 #> 3 3 0.300 #> 4 4 0.150 #> 5 5 0.250
# an existing data frame or data table x <- data.frame(Val=1:10, Prob=0.1) rvtable(x)
#> # A tibble: 10 x 2 #> Val Prob #> <int> <dbl> #> 1 1 0.100 #> 2 2 0.100 #> 3 3 0.100 #> 4 4 0.100 #> 5 5 0.100 #> 6 6 0.100 #> 7 7 0.100 #> 8 8 0.100 #> 9 9 0.100 #> 10 10 0.100
x <- data.frame(id=rep(LETTERS[1:2], each=10), v1=rep(1:10, 2), p1=c(rep(0.1, 10), sqrt(1:10))) rvtable(x, Val="v1", Prob="p1")
#> # A tibble: 20 x 3 #> id v1 p1 #> <fct> <int> <dbl> #> 1 A 1 0.100 #> 2 A 2 0.100 #> 3 A 3 0.100 #> 4 A 4 0.100 #> 5 A 5 0.100 #> 6 A 6 0.100 #> 7 A 7 0.100 #> 8 A 8 0.100 #> 9 A 9 0.100 #> 10 A 10 0.100 #> 11 B 1 1.00 #> 12 B 2 1.41 #> 13 B 3 1.73 #> 14 B 4 2.00 #> 15 B 5 2.24 #> 16 B 6 2.45 #> 17 B 7 2.65 #> 18 B 8 2.83 #> 19 B 9 3.00 #> 20 B 10 3.16