[Rd] duplicated factor labels.
jorismeys at gmail.com
Fri Jun 23 14:57:42 CEST 2017
On Fri, Jun 23, 2017 at 2:20 PM, Uwe Ligges <ligges at statistik.tu-dortmund.de
> I had the chance to look at > 1300 SPSS files our consulting center
> collected during the last 20 year, and in several hundred cases we found
> such a problem that was copy & paste error and simply wrong.
> Only in < 5 cases condensing several levels into one was appropriate,
> hence we decided to keep duplicated levels by changing the names as the
I understand where you're coming from. I know from personal experience
exactly how much this is a pain in the ass, but I also have to group
different labels in fewer categories in about every data set I get from
clients or students. Especially when things come from surveys with 30
different education categories etc.
So I would argue that checking for duplicate labels is a task for
read.spss() and can be added as an extra check if necessary. But I
personally don't see the fact that clients regularly mess up SPSS files as
enough of an argument to not change the behaviour of factor().
> Based on this experience I'd propose no to touch factor but rather add a
> function that easily allows for this reduction, if we do not have that
There are functions already that allow to do this, like the tidyverse
dplyr::recode_factor() function. It's rather trivial doing this with
logical operators and indices, and I have my own "recode" function so I
don't have to rely on any package or retype the same construct over and
over again but with different values.
But a clean and logical way to recode/group different levels when
constructing the factor, would be at least for me be very convenient. But
I'm just a guy and I'm not writing the code, so in the end it's up to you
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics
tel : +32 (0)9 264 61 79
Joris.Meys at Ugent.be
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
More information about the R-devel