factor.encoder()
returns an encoder for a qualitative variable.
Usage
factor.encoder(
x,
k,
use.catchall = TRUE,
catchall = "(others)",
tag = "x",
frame = NULL,
weights = NULL
)
factor.frame(levels, catchall = "(others)", tag = "x")
Arguments
- x
a vector to be encoded as a qualitative variable.
- k
an integer specifying the maximum number of distinct levels. If not positive, all unique values of
x
are used as levels.- use.catchall
logical. If
TRUE
, less frequent levels are dropped and replaced by the catchall level.- catchall
a character string to be used as the catchall level.
- tag
character string. The name of the variable.
- frame
a "factor.frame" object or a character vector that defines the levels of the variable.
- weights
optional. A numeric vector of sample weights for each value of
x
.- levels
a vector to be used as the levels of the variable.
Value
factor.encoder()
returns a list containing the following components:
- frame
an object of class "factor.frame".
- encode
a function to encode
x
into a dummy matrix.- n
the number of encoding levels.
- type
the type of encoding.
factor.frame()
returns a "factor.frame" object containing the encoding information.
Details
factor.encoder()
extracts the unique values (levels) from the vector x
and returns a list containing the encode()
function to convert a vector into a dummy matrix using one-hot encoding.
If use.catchall
is TRUE
and the number of levels exceeds k
, only the most frequent k - 1 levels are used and the other values are replaced by the catchall
.
Examples
data(iris, package = "datasets")
enc <- factor.encoder(x = iris$Species, use.catchall = FALSE, tag = "Species")
enc$frame
#> Species Species_level
#> 1 setosa 1
#> 2 versicolor 2
#> 3 virginica 3
enc$encode(x = c("setosa", "virginica", "ensata", NA, "versicolor"))
#> setosa versicolor virginica
#> [1,] 1 0 0
#> [2,] 0 0 1
#> [3,] 0 0 0
#> [4,] 0 0 0
#> [5,] 0 1 0
frm <- factor.frame(c("setosa", "virginica"), "other iris")
enc <- factor.encoder(x = iris$Species, frame = frm)
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))
#> setosa virginica other iris
#> [1,] 1 0 0
#> [2,] 0 1 0
#> [3,] 0 0 1
#> [4,] 0 0 1
#> [5,] 0 0 1
enc <- factor.encoder(x = iris$Species, frame = c("setosa", "versicolor"))
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))
#> setosa versicolor
#> [1,] 1 0
#> [2,] 0 0
#> [3,] 0 0
#> [4,] 0 0
#> [5,] 0 1