[R] how to collapse categories or re-categorize variables?

Phil Spector spector at stat.berkeley.edu
Sat Jul 17 23:15:13 CEST 2010


Please look at Peter Dalgaard's response a little more
carefully.  There's a big difference between the levels=
argument (which must be unique) and the labels= argument 
(which need not be).  Here are two ways
to do what you want:

> d = 0:2
> factor(d,levels=0:2,labels=c('0','1','1'))
[1] 0 1 1
> library(car)
> recode(d,"c(1,2)='1'")
[1] 0 1 1


 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Sat, 17 Jul 2010, Peter Dalgaard wrote:

> Ista Zahn wrote:
>> Hi,
>> On Fri, Jul 16, 2010 at 5:18 PM, CC <turtysmail at gmail.com> wrote:
>>> I am sure this is a very basic question:
>>>
>>> I have 600,000 categorical variables in a data.frame - each of which is
>>> classified as "0", "1", or "2"
>>>
>>> What I would like to do is collapse "1" and "2" and leave "0" by itself,
>>> such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in
>>> the end I only want "0" and "1" as categories for each of the variables.
>>
>> Something like this should work
>>
>> for (i in names(dat)) {
>> dat[, i]  <- factor(dat[, i], levels = c("0", "1", "2"), labels =
>> c("0", "1", "1))
>> }
>
> Unfortunately, it won't:
>
>> d <- 0:2
>> factor(d, levels=c(0,1,1))
> [1] 0    1    <NA>
> Levels: 0 1 1
> Warning message:
> In `levels<-`(`*tmp*`, value = c("0", "1", "1")) :
>  duplicated levels will not be allowed in factors anymore
>
>
> This effect, I have been told, goes way back to design choices in S
> (that you can have repeated level names) plus compatibility ever since.
>
> It would make more sense if it behaved like
>
> d <- factor(d); levels(d) <- c(0,1,1)
>
> and maybe, some time in the future, it will. Meanwhile, the above is the
> workaround.
>
> (BTW, if there are 600000 variables, you probably don't want to iterate
> over their names, more likely "for(i in seq_along(dat))...")
>
> -- 
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list