[Rd] aggregate(empty data.frame) (PR#13167)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Oct 15 14:15:29 CEST 2008
On Wed, 15 Oct 2008, prokaj at cs.elte.hu wrote:
> Full_Name: Vilmos Prokaj
> Version: R-2..7.1
> OS: Win XP
> Submission from: (NULL) (157.181.227.218)
>
>
> The 'aggregate' function on an empty data.frame generate an error, however it
> should return according to the documentation an empty data.frame.
Please explain that to me: I don't see it says so.
What I see is
'aggregate.data.frame' is the data frame method. If 'x' is not a
data frame, it is coerced to one. Then, each of the variables
(columns) in 'x' is split into subsets of cases (rows) of
identical combinations of the components of 'by', and 'FUN' is
applied to each such subset with further arguments in '...' passed
to it. (I.e., 'tapply(VAR, by, FUN, ..., simplify = FALSE)' is
done for each variable 'VAR' in 'x', conveniently wrapped into one
call to 'lapply()'.) Empty subsets are removed, and the result is
reformatted into a data frame containing the variables in 'by' and
'x'.
Since all the subsets are empty, there is no result to be reformatted.
In particular the second and third columns of your example have types that
can only be determined by running sum() and since all groups are empty,
sum() is never run. We can't create a data frame that would be consistent
with that returned for one or more groups via the documented algorithm.
The error message could definitely be clearer, but I don't see an
alternative to giving an error.
> e.g.
> z<-data.frame(a=integer(0),b=numeric(0))
> aggregate(z,by=z[1],FUN=sum)
>
> In a more realistic situation 'z' is of the form z<-zz[cond,] where cond is a
> computed logical vector and zz is not empty data.frame.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list