[R] Creating subsets with factors
Frank E Harrell Jr
fharrell at virginia.edu
Wed Jan 9 13:53:41 CET 2002
I respectfully disagree with Peter. In all the data analysis I have done I have found that 0.99 of the time it is most convenient to have unused levels dropped upon subsetting. In my work I do this by default. I realize that system overrides are to be avoided at almost all costs, but [.factor is the only function I override for R. I print a message saying that the traditional behavior may be obtained by using options(drop.unused.levels=FALSE). I have lobbied for S-Plus and R developers to adopt this approach although having the DEFAULT be drop.unused.levels=FALSE (i.e., users would say options(drop.unused.levels=TRUE) to get my behavior), but an insufficient number of people seem to agree with me on this point.
A slightly more logical way to drop unused levels for the current setup is
x <- x[,drop=T]
I have not needed to use c(f1, f2) but it seems to me that Peter's example points out more a deficiency in c or the need for another binding function for this case (which can be done with factor(c(as.character(f1),as.character(f2))) depending on how NAs are handled.
Frank Harrell
On 09 Jan 2002 11:07:52 +0100
Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> wrote:
> Sven Garbade <garbade at psy.uni-muenchen.de> writes:
>
> > Hi all,
> >
> > I don't understand the following output. I've created a data subset from
> > a data frame by
> >
> > > p1.sub <- subset(p1.dat, vp!="p1")
> >
> > this is ok. But
> >
> > > attach(p1.sub)
> > > vp
> > [1] p1ab p1ab p1ab p1ab p1ab p1br p1br p1br p1br p1br p1kf p1kf p1kf
> > p1kf p1kf
> > [16] p1mg p1mg p1mg p1mg p1mg p1mw p1mw p1mw p1mw p1mw
> > Levels: p1 p1ab p1br p1kf p1mg p1mw
> >
> > shows me that the factor vp has 6 levels instead of 5? 5 should be the
> > correct number of levels, because p1 isn't in the data subset.
>
> Nope. Factors can have levels that are not present in the data set.
> There are good reasons for this. For instance you cannot c(f1,f2) if
> f1 and f2 are factors with different level sets.
>
> If you want to reduce the levels to those present in the factor, use
>
> p1.sub$vp <- factor(p1.sub$vp)
>
> --
> O__ ---- Peter Dalgaard Blegdamsvej 3
> c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list