[Rd] aggregate: with 2 by variables in the result the 2nd by-variable is wrong (PR#14213)

Fri Feb 12 13:50:11 CET 2010

Full_Name: Franz Quehenberger
Version: 2.10.1
OS: Windows XP
Submission from: (NULL) (145.244.10.3)

aggregate is supposed to produce a data.frame that contains a line for each
combination  of levels of the variables in the by list. The first columns of the
result contain these combinations of levels. With two by variables the second
by-variable takes always only one value. However, it works fine with one or
three by-variables.

The problems seems to be caused by this line of code in aggregate():

    w <- as.data.frame(w, stringsAsFactors = FALSE)[which(!unlist(lapply(z,
is.null))), , drop = FALSE]

or more specifically by: 

    [which(!unlist(lapply(z, is.null))), , drop = FALSE]

Kind regards
FQ

# demonstration of the aggregate bug ind R 2.10.1
factor.a=rep(letters[1:3],4)
factor.b=rep(letters[4:5],each=3,times=2)
factor.c=rep(letters[4:5+2],each=6)
data=data.frame(factor.a,factor.b,factor.c,x)
x=1:12
#one by-variable works:
aggregate(x,list(a=factor.a),FUN=mean)
#thre by-variable work fine:
aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)
#two by-variables do not produce the levels of the second by-variable
correctly:
aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)
# data
print(data)
++++++++++++++++++++++++++++++++++++++++++++++++++++
Result of the R code:
++++++++++++++++++++++++++++++++++++++++++++++++++++

> # demonstration of the aggregate bug ind R 2.10.1
> factor.a=rep(letters[1:3],4)
> factor.b=rep(letters[4:5],each=3,times=2)
> factor.c=rep(letters[4:5+2],each=6)
> data=data.frame(factor.a,factor.b,factor.c,x)
> x=1:12
> #one by-variable works:
> aggregate(x,list(a=factor.a),FUN=mean)
  a   x
1 a 5.5
2 b 6.5
3 c 7.5
> #thre by-variable work fine:
> aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)
  a b c x
1 a d d 4
2 b d d 5
3 c d d 6
4 a e e 7
5 b e e 8
6 c e e 9
> #two by-variables do not produce the levels of the second by-variable
correctly:
> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)
  a b x
1 a d 4
2 b d 5
3 c d 6
4 a d 7
5 b d 8
6 c d 9
Warnmeldung:
In data.frame(w, lapply(y, unlist, use.names = FALSE), stringsAsFactors = FALSE)
:
  row names were found from a short variable and have been discarded
> # data
> print(data)
   factor.a factor.b factor.c  x
1         a        d        f  1
2         b        d        f  2
3         c        d        f  3
4         a        e        f  4
5         b        e        f  5
6         c        e        f  6
7         a        d        g  7
8         b        d        g  8
9         c        d        g  9
10        a        e        g 10
11        b        e        g 11
12        c        e        g 12
>