[Rd] A suggestion for an amendment to tapply
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Nov 6 08:23:56 CET 2007
On Tue, 6 Nov 2007, Bill.Venables at csiro.au wrote:
> Unfortunately I think it would break too much existing code. tapply()
> is an old function and many people have gotten used to the way it works
> now.
It is also not necessarily desirable: FUN(numeric(0)) might be an error.
For example:
> Z <- data.frame(x=rnorm(10), f=rep(c("a", "b"), each=5))[1:5, ]
> tapply(Z$x, Z$f, sd)
but sd(numeric(0)) is an error. (Similar things involving var are 'in the
wild' and so would be broken.)
> This is not to suggest there could not be another argument added at the
> end to indicate that you want the new behaviour, though. e.g.
>
> tapply <- function (X, INDEX, FUN=NULL, ..., simplify=TRUE,
> handle.empty.levels = FALSE)
>
> but this raises the question of what sort of time penalty the
> modification might entail. Probably not much for most situations, I
> suppose. (I know this argument name looks long, but you do need a
> fairly specific argument name, or it will start to impinge on the ...
> argument.)
>
> Just some thoughts.
>
> Bill Venables.
>
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary): +61 7 3826 7304
> Mobile: +61 4 8819 4402
> Home Phone: +61 7 3286 7700
> mailto:Bill.Venables at csiro.au
> http://www.cmis.csiro.au/bill.venables/
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Andrew Robinson
> Sent: Tuesday, 6 November 2007 3:10 PM
> To: R-Devel
> Subject: [Rd] A suggestion for an amendment to tapply
>
> Dear R-developers,
>
> when tapply() is invoked on factors that have empty levels, it returns
> NA. This behaviour is in accord with the tapply documentation, and is
> reasonable in many cases. However, when FUN is sum, it would also
> seem reasonable to return 0 instead of NA, because "the sum of an
> empty set is zero, by definition."
>
> I'd like to raise a discussion of the possibility of an amendment to
> tapply.
>
> The attached patch changes the function so that it checks if there are
> any empty levels, and if there are, replaces the corresponding NA
> values with the result of applying FUN to the empty set. Eg in the
> case of sum, it replaces the NA with 0, whereas with mean, it replaces
> the NA with NA, and issues a warning.
>
> This change has the following advantage: tapply and sum work better
> together. Arguably, tapply and any other function that has a non-NA
> response to the empty set will also work better together.
> Furthermore, tapply shows a warning if FUN would normally show a
> warning upon being evaluated on an empty set. That deviates from
> current behaviour, which might be bad, but also provides information
> that might be useful to the user, so that would be good.
>
> The attached script provides the new function in full, and
> demonstrates its application in some simple test cases.
>
> Best wishes,
>
> Andrew
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list