[R] Subset doesn't drop unused factor levels

Frank E Harrell Jr f.harrell at vanderbilt.edu
Thu Oct 7 20:38:45 CEST 2004


hadley wickham wrote:
> a <- data.frame(b = rep(1:5, each=2), c=factor(rep("a",10), levels=c("a","b")))
> levels(subset(a, b=1, drop=T)$c)
> # [1] "a" "b"
> 
> Is this a bug?
> 
> Thanks,,
> 
> Hadley
> 

This is always controversial.  I am apparently in the small minority in 
believing that the default behavior is what you are wishing for.  That's 
why the Hmisc package by default drops unused levels (but allows you to 
override that with options(drop.unused.levels=FALSE).  It is distasteful 
to have to override system behavior but I felt I had to in this case. 
No one in R-core wanted to add a non-default option to R e.g. 
options(drop.unused.levels=TRUE).

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list