[R] Collapse factor levels

David Winsemius dwinsemius at comcast.net
Sun Nov 1 22:21:34 CET 2009


On Nov 1, 2009, at 3:51 PM, Kevin E. Thorpe wrote:

> I'm sure this is simple enough, but an R site search on my subject
> terms did suggest a solution.  I have a numeric vector with many
> values that I wish to create a factor from having only a few levels.
> Here is a toy example.
>
> > x <- 1:10
> > x <-  
> factor 
> (x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))

You have thusly created a pathological situation. In 2.10.0 this is  
what you might see:

 >  x <-  
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

What you _should_ have done was:

  x2 <- factor(c("A","A","A","B","B","B","C","C","C","C"))

The usual approach to getting rid of unused factor levels is just to  
apply the function factor() again without additional arguments.

 > x <- factor(x)  # the "x" was from your code
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

# but that will be the last time you will see the warning..

 > summary(x)
A B C
3 3 4

-- 
David.
> > x
> [1] A A A B B B C C C C
> Levels: A A A B B B C C C C
> > summary(x)
> A A A B B B C C C C
> 3 0 0 3 0 0 4 0 0 0
>
> So, there are clearly still 10 underlying levels.  The results I would
> like to see from printing the value and summary(x) are:
>
> > x
> [1] A A A B B B C C C C
> Levels: A B C
> > summary(x)
> A B C
> 3 3 4
>
> Hopefully this makes sense.
>
> Thanks,
>
> Kevin
>
> -- 
> Kevin E. Thorpe
> Biostatistician/Trialist, Knowledge Translation Program
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list