[Rd] duplicated factor labels.
Paul Johnson
pauljohn32 at gmail.com
Fri Jun 16 18:02:34 CEST 2017
On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys <jorismeys at gmail.com> wrote:
> To extwnd on Martin 's explanation :
>
> In factor(), levels are the unique input values and labels the unique output
> values. So the function levels() actually displays the labels.
>
Dear Joris
I think we agree. Currently, factor insists both levels and labels be unique.
I wish that it would not accept nonunique labels. I also understand it
is impractical to change this now in base R.
I don't think I succeeded in explaining why this would be nicer.
Here's another example. Fairly often, we see input data like
x <- c("Male", "Man", "male", "Man", "Female")
The first four represent the same value. I'd like to go in one step
to a new factor variable with enumerated types "Male" and "Female".
This fails
xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
labels = c("Male", "Male", "Male", "Female"))
Instead, we need 2 steps.
xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
levels(xf) <- c("Male", "Male", "Male", "Female")
I think it is quirky that `levels<-.factor` allows the duplicated
labels, whereas factor does not.
I wrote a function rockchalk::combineLevels to simplify combining
levels, but most of the students here like plyr::mapvalues to do it.
The use of levels() can be tricky because one must enumerate all
values, not just the ones being changed.
But I do understand Martin's point. Its been this way 25 years, it
won't change. :).
> Cheers
> Joris
>
>
--
Paul E. Johnson http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
To write to me directly, please address me at pauljohn at ku.edu.
More information about the R-devel
mailing list