Encoder for Quantitative Variables — numeric.encoder • midr

numeric.encoder() returns an encoder for a quantitative variable.

Usage

numeric.encoder(
  x,
  k,
  type = 1L,
  encoding.digits = NULL,
  tag = "x",
  frame = NULL,
  weights = NULL
)

numeric.frame(
  reps = NULL,
  breaks = NULL,
  type = NULL,
  encoding.digits = NULL,
  tag = "x"
)

# S3 method for class 'encoder'
print(x, digits = NULL, ...)

Arguments

x: a numeric vector to be encoded.
k: an integer specifying the coarseness of the encoding. If not positive, all unique values of x are used as sample points.
type: an integer specifying the encoding method. If 1, values are encoded to a [0, 1] scale based on linear interpolation of the knots. If 0, values are encoded to 0 or 1 using ont-hot encoding on the intervals.
encoding.digits: an integer specifying the rounding digits for the encoding in case type is 1.
tag: character string. The name of the variable.
frame: a "numeric.frame" object or a numeric vector that defines the sample points of the binning.
weights: optional. A numeric vector of sample weights for each value of x.
reps: a numeric vector to be used as the representative values (knots).
breaks: a numeric vector to be used as the binning breaks.
digits: the minimum number of significant digits to be used.
...: not used.

Value

numeric.encoder() returns a list containing the following components:

frame: an object of class "numeric.frame".
encode: a function to encode x into a dummy matrix.
n: the number of encoding levels.
type: the type of encoding, "linear" or "constant".

numeric.frame() returns a "numeric.frame" object containing the encoding information.

Details

numeric.encoder() selects sample points from the variable x and returns a list containing the encode() function to convert a vector into a dummy matrix. If type is 1, k is considered the maximum number of knots, and the values between two knots are encoded as two decimals, reflecting the relative position to the knots. If type is 0, k is considered the maximum number of intervals, and the values are converted using one-hot encoding on the intervals.

Examples

data(iris, package = "datasets")
enc <- numeric.encoder(x = iris$Sepal.Length, k = 5L, tag = "Sepal.Length")
enc$frame
#>   Sepal.Length Sepal.Length_min Sepal.Length_max
#> 1          4.3             4.30             4.70
#> 2          5.1             4.70             5.45
#> 3          5.8             5.45             6.10
#> 4          6.4             6.10             7.15
#> 5          7.9             7.15             7.90
enc$encode(x = c(4:8, NA))
#>        4.3   5.1       5.8       6.4 7.9
#> [1,] 1.000 0.000 0.0000000 0.0000000 0.0
#> [2,] 0.125 0.875 0.0000000 0.0000000 0.0
#> [3,] 0.000 0.000 0.6666667 0.3333333 0.0
#> [4,] 0.000 0.000 0.0000000 0.6000000 0.4
#> [5,] 0.000 0.000 0.0000000 0.0000000 1.0
#> [6,] 0.000 0.000 0.0000000 0.0000000 0.0

frm <- numeric.frame(breaks = seq(3, 9, 2), type = 0L)
enc <- numeric.encoder(x = iris$Sepal.Length, frame = frm)
enc$encode(x = c(4:8, NA))
#>      [-Inf, 5) [5, 7) [7, Inf)
#> [1,]         1      0        0
#> [2,]         0      1        0
#> [3,]         0      1        0
#> [4,]         0      0        1
#> [5,]         0      0        1
#> [6,]         0      0        0

enc <- numeric.encoder(x = iris$Sepal.Length, frame = seq(3, 9, 2))
enc$encode(x = c(4:8, NA))
#>        3   5   7   9
#> [1,] 0.5 0.5 0.0 0.0
#> [2,] 0.0 1.0 0.0 0.0
#> [3,] 0.0 0.5 0.5 0.0
#> [4,] 0.0 0.0 1.0 0.0
#> [5,] 0.0 0.0 0.5 0.5
#> [6,] 0.0 0.0 0.0 0.0