Weighted Data Frames — weighted • midr

weighted() returns a data frame with sample weights.

Usage

weighted(data, weights = NULL)

augmented(data, weights = NULL, size = nrow(data), r = 0.01)

shuffled(data, weights = NULL, size = nrow(data))

latticized(
  data,
  weights = NULL,
  k = 10L,
  type = 0L,
  use.catchall = TRUE,
  catchall = "(others)",
  frames = list(),
  keep.mean = TRUE
)

# S3 method for class 'weighted'
weights(object, ...)

Arguments

data: a data frame.
weights: a numeric vector of sample weights for each observation in data.
size: integer. The number of random observations whose values are sampled from the marginal distribution of each variable.
r: a numeric value specifying the ratio of the total weights for the random observations to the sum of sample weights. The weight for the random observations is calculated as sum(attr(data, "weights")) * r / size.
k: integer. The maximum number of sample points for each variable. If not positive, all unique values are used as sample points.
type: integer. The type of encoding of quantitative variables to be passed to numeric.encoder().
use.catchall: logical. If TRUE, less frequent levels of factor variables are dropped and replaced by the catchall level.
catchall: a character string to be used as the catchall level.
frames: a named list of encoding frames ("numeric.frame" or "factor.frame" objects).
keep.mean: logical. If TRUE, the representative values of each group is the average of the corresponding group.
object: a data frame with the attribute "weights".
...: not used.

Value

weighted() returns a data frame with the attribute "weights". augmented() returns a weighted data frame of the original data and the shuffled data with relatively small weights. shuffled() returns a weighted data frame of the shuffled data. latticized() returns a weighted data frame of latticized data, whose values are grouped and replaced by the representative value of the corresponding group.

Details

weighted() returns a data frame with the "weights" attribute that can be extracted using stats::weights(). augmented(), shuffled() and latticized() return a weighted data frame with some data modifications. These functions are designed for use with interpret(). As the modified data frames do not preserve the original correlation structure of the variables, the response variable (y) should always be replaced by the model predictions (yhat).

Examples

set.seed(42)
x1 <- runif(1000L, -1, 1)
x2 <- x1 + runif(1000L, -1, 1)
weights <- (abs(x1) + abs(x2)) / 2
x <- data.frame(x1, x2)
xw <- weighted(x, weights)
ggplot2::ggplot(xw, ggplot2::aes(x1, x2, alpha = weights(xw))) +
  ggplot2::geom_point() +
  ggplot2::ggtitle("weighted")

xs <- shuffled(xw)
ggplot2::ggplot(xs, ggplot2::aes(x1, x2, alpha = weights(xs))) +
  ggplot2::geom_point() +
  ggplot2::ggtitle("shuffled")

xa <- augmented(xw)
ggplot2::ggplot(xa, ggplot2::aes(x1, x2, alpha = weights(xa))) +
  ggplot2::geom_point() +
  ggplot2::ggtitle("augmented")

xl <- latticized(xw)
ggplot2::ggplot(xl, ggplot2::aes(x1, x2, size = weights(xl))) +
  ggplot2::geom_point() +
  ggplot2::ggtitle("latticized")