[R] dropping factor levels in subset

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Jun 27 19:11:05 CEST 2003


Re: [, drop=TRUE} for factors

It's been in S-PLUS (but not S I believe) for a long time, probably since
before 1994: it is in S+3.4, 1996 vintage.

It appears to have been added to R around August 1998.

Yes, Frank Harrell argues for the default to be true and I believe his
Hmisc package overrides this.  Although less unsafe than it used to be (a
lot more consistency checking of factor levels has been added) it is still
I believe undesirable.  The argument `drop.unused.levels' to model.frame
will usually do all that is required.  (That's another thing that is
very-little known.)

On Fri, 27 Jun 2003, Marc Schwartz wrote:

> >-----Original Message-----
> >From: r-help-bounces at stat.math.ethz.ch 
> >[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof 
> >Brian Ripley
> >Sent: Friday, June 27, 2003 1:35 AM
> >To: Marc Schwartz
> >Cc: r-help at stat.math.ethz.ch; 'Nick Bond'
> >Subject: RE: [R] dropping factor levels in subset
> >
> >
> >A more transparent solution is
> >
> >old.factor[1:3, drop = TRUE]
> >
> >That has worked for a long time, but apparently not been 
> >documented in R
> >until 1.7.1 (docs added a couple of hours before release). So 
> >you could do
> >(probably, since there are some bugs prior to 1.8.0)
> >
> >crb[] <- lapply(crb, function(x) x[drop=TRUE])
> >
> >to remove the unused levels on all factors in the data frame.
> 
> SNIP
> 
> >
> 
> Prof. Ripley,
> 
> Thank you for pointing this out.  I checked both ?factor and ?"[" and
> note that this behavior is now documented.
> 
> A question:  How long (roughly) has this been present in R for
> factors? 
> 
> I ask because I had a vague recollection this morning, after seeing
> your reply, of an exchange between Frank Harrell and others regarding
> just such a 'feature' in R some time ago.  It turns out to have been
> back in January of 2002 based upon my search of the r-help archive
> this morning
> (http://maths.newcastle.edu.au/~rking/R/help/01c/3809.html). In this
> exchange, Frank suggested using just such an approach (ie. "x <-
> x[,drop=T]") for factor objects, whereas Peter in that same thread
> noted the use of 'x <- factor(x)' in his reply, which is what I tend
> to use. If my re-read of the thread is correct, I believe that Frank
> was also arguing in favor of a global options() setting regarding this
> behavior.
> 
> A recent (May 2003) exchange between Duncan Murdoch and John Chambers
> (http://maths.newcastle.edu.au/~rking/R/devel/03a/1003.html) would
> suggest that such a feature was present for vectors, but perhaps
> incompletely documented as you perhaps suggest, given Duncan's
> question if my read of the exchange is correct.
> 
> I now note that for factor objects, this is included in MASS 4 (pg
> 19), whereas it is a footnote in MASS 3 (pg 20) and I could not find
> it in MASS 1 (I don't have a copy of MASS 2 to review). It is also a
> footnote in S Programming (pg 14). Not sure if any significance should
> be attached to being a footnote versus being in the body of the text. 

None.

> 
> Lastly, I note that references to "[" in the "White Book" include a
> 'drop' argument on pg 465 and in the "Green Book" on pg 340, which
> would suggest that it has been around for some time, at least as a
> high level method, though with no specific reference that I could note
> for factor objects.
> 
> Regards and thanks,
> 
> Marc
> 
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list