R/rvtable.R
Class constructor for rvtable
objects.
rvtable(x, y = NULL, Val, Prob, discrete = FALSE, density.args = list(), weights = list(), force.dist = TRUE)
x | a numeric vector or data frame. |
---|---|
y | an optional vector of probabilities associated with |
Val | the column name of |
Prob | the column name of |
discrete | whether the random variable is discrete. |
density.args | optional arguments passed to |
weights |
|
force.dist | logical, force distribution-type rvtable output if |
an object of class rvtable
.
These are long format data tables containing a Val and Prob column describing the distribution of a random variable.
Any other columns are ID columns and should be categorical variables. The random variable described by Val and Prob may be discrete or continuous.
When discrete, probabilities are true probabilities.
When continuous, Val and Prob are based on x
and y
output from density
and describe a distribution curve,
and therefore values in Prob may be greater than one and may not sum to one.
Val is typically numeric but may be character when discrete such as when an rvtable object is returned from inverse_pmf
.
For data frame inputs, Val and Prob here refer generally to whatever columns in an rvtable are specified by Val
and Prob
.
In rvtable
, if the Val
argument is not supplied, Val is assumed to be Val="Val"
and rvtable
will search the names of a data frame for this column, throwing an error if it is not found, like with any other value of Val
.
When Prob
is missing, however, this is analogous to when x
is numeric and
y
and x
probability attributes are both NULL: the data in the Val
column are assumed to be a direct sample
from a distribution rather than a vector of values that describes a distribution in conjunction with a Prob
column.
When x
is numeric, a supplied Val
will substitute for rvtable names x
and y
in the output, respectively.
All rvtable objects are in one of two forms: distribution-type or sample-type.
This primary constructor constructs distribution- or sample-type rvtable objects,
with the corresponding attribute tabletype="distribution"
or tabletype="sample"
.
Sampling on an rvtable can generate a sample-type rvtable, with the attribute tabletype="sample"
.
Other operations like merging or marginalizing distributions typically yield rvtables in distribution form.
This is the common form and rvtables are usually kept in this form until a final step in a processing chain
where samples are needed.
Every rvtable object also has a variable type attribute, rvtype
, which is either "discrete" or "continuous".
Other attributes assigned during rvtable construction include valcol
and probcol
, the names of the Val and Prob columns,
and a density.args
attribute that lists any most recent arguments passed to density
in the process
of making the rvtable.
If an rvtable is already of class rvtable
, the rvtable
function simply returns
the rvtable as is; any other arguments passed to rvtable
are ignored and
neither the table nor its attributes are updated or altered in any way.
Weights for levels of an ID variable are can be set by set_weights
.
In rvtable
, weights
is passed to set_weights
so that weights can be set on rvtable construction if desired.
It can be relatively cumbersome to set weights for all ID variables at once in a call to rvtable
, and
particularly wasteful if there are several ID columns and most or all have levels with equal weights.
The weights attribute assigned to the rvtable will be an empty list
if there are no ID columns in x
and a named list of data frames otherwise.
The names are the names of the ID variables in x
and each data frame has two columns: levels
and weights
,
giving the weighting of an ID variable's levels.
When weights=NULL
or weights=list()
, the values in the weights
column for each level in each data frame
are set to 1
for equal weighting.
Similarly, individual named list elements can be set to NULL
for convenience instead of passing a data frame.
NULL
or list()
elements in the weights
list are converted to data frames with weights equal to one.
Note that weights
is ignored if x
is not a data frame.
# basic samples from continuous and discrete RVs x <- rnorm(100) rvtable(x)#> # A tibble: 512 x 2 #> x y #> <dbl> <dbl> #> 1 -3.45 0.000213 #> 2 -3.44 0.000240 #> 3 -3.42 0.000269 #> 4 -3.41 0.000302 #> 5 -3.39 0.000339 #> 6 -3.38 0.000379 #> 7 -3.36 0.000424 #> 8 -3.34 0.000474 #> 9 -3.33 0.000528 #> 10 -3.31 0.000588 #> # ... with 502 more rowsrvtable(x, density.args=list(n=50, adjust=2))#> # A tibble: 50 x 2 #> x y #> <dbl> <dbl> #> 1 -4.73 0.000191 #> 2 -4.52 0.000428 #> 3 -4.30 0.000906 #> 4 -4.08 0.00181 #> 5 -3.86 0.00342 #> 6 -3.64 0.00614 #> 7 -3.43 0.0104 #> 8 -3.21 0.0169 #> 9 -2.99 0.0261 #> 10 -2.77 0.0385 #> # ... with 40 more rowsrvtable(x, discrete=TRUE) # incorrect: x is a continuous RV#> # A tibble: 100 x 2 #> x y #> <dbl> <dbl> #> 1 -2.18 0.0100 #> 2 -2.12 0.0100 #> 3 -1.87 0.0100 #> 4 -1.87 0.0100 #> 5 -1.85 0.0100 #> 6 -1.80 0.0100 #> 7 -1.78 0.0100 #> 8 -1.64 0.0100 #> 9 -1.47 0.0100 #> 10 -1.43 0.0100 #> # ... with 90 more rowsx <- sample(1:10, size=30, replace=TRUE, prob=sqrt(10:1)) rvtable(x, discrete=TRUE) # discrete=T only needed if ambiguous#> # A tibble: 9 x 2 #> x y #> <dbl> <dbl> #> 1 1.00 0.167 #> 2 2.00 0.100 #> 3 3.00 0.133 #> 4 4.00 0.0333 #> 5 5.00 0.167 #> 6 6.00 0.133 #> 7 7.00 0.133 #> 8 8.00 0.0667 #> 9 9.00 0.0667x <- 1:5 probs <- c(0.1, 0.2, 0.3, 0.15, 0.25) rvtable(x, y=probs) # discrete inferred from y#> # A tibble: 5 x 2 #> x y #> <int> <dbl> #> 1 1 0.100 #> 2 2 0.200 #> 3 3 0.300 #> 4 4 0.150 #> 5 5 0.250attr(x, "probabilities") <- probs rvtable(x) # inferred from attributes (partial match 'prob')#> # A tibble: 5 x 2 #> x y #> <int> <dbl> #> 1 1 0.100 #> 2 2 0.200 #> 3 3 0.300 #> 4 4 0.150 #> 5 5 0.250# an existing data frame or data table x <- data.frame(Val=1:10, Prob=0.1) rvtable(x)#> # A tibble: 10 x 2 #> Val Prob #> <int> <dbl> #> 1 1 0.100 #> 2 2 0.100 #> 3 3 0.100 #> 4 4 0.100 #> 5 5 0.100 #> 6 6 0.100 #> 7 7 0.100 #> 8 8 0.100 #> 9 9 0.100 #> 10 10 0.100x <- data.frame(id=rep(LETTERS[1:2], each=10), v1=rep(1:10, 2), p1=c(rep(0.1, 10), sqrt(1:10))) rvtable(x, Val="v1", Prob="p1")#> # A tibble: 20 x 3 #> id v1 p1 #> <fct> <int> <dbl> #> 1 A 1 0.100 #> 2 A 2 0.100 #> 3 A 3 0.100 #> 4 A 4 0.100 #> 5 A 5 0.100 #> 6 A 6 0.100 #> 7 A 7 0.100 #> 8 A 8 0.100 #> 9 A 9 0.100 #> 10 A 10 0.100 #> 11 B 1 1.00 #> 12 B 2 1.41 #> 13 B 3 1.73 #> 14 B 4 2.00 #> 15 B 5 2.24 #> 16 B 6 2.45 #> 17 B 7 2.65 #> 18 B 8 2.83 #> 19 B 9 3.00 #> 20 B 10 3.16