[Rd] aggregate: with 2 by variables in the result the 2nd by-variable is wrong (PR#14213)
Peter Ehlers
ehlers at ucalgary.ca
Fri Feb 12 22:01:52 CET 2010
franz.quehenberger at medunigraz.at wrote:
> Full_Name: Franz Quehenberger
> Version: 2.10.1
> OS: Windows XP
> Submission from: (NULL) (145.244.10.3)
>
>
> aggregate is supposed to produce a data.frame that contains a line for each
> combination of levels of the variables in the by list. The first columns of the
> result contain these combinations of levels. With two by variables the second
> by-variable takes always only one value. However, it works fine with one or
> three by-variables.
>
> The problems seems to be caused by this line of code in aggregate():
>
> w <- as.data.frame(w, stringsAsFactors = FALSE)[which(!unlist(lapply(z,
> is.null))), , drop = FALSE]
>
> or more specifically by:
>
> [which(!unlist(lapply(z, is.null))), , drop = FALSE]
>
> Kind regards
> FQ
>
>
>
> # demonstration of the aggregate bug ind R 2.10.1
> factor.a=rep(letters[1:3],4)
> factor.b=rep(letters[4:5],each=3,times=2)
> factor.c=rep(letters[4:5+2],each=6)
> data=data.frame(factor.a,factor.b,factor.c,x)
> x=1:12
> #one by-variable works:
> aggregate(x,list(a=factor.a),FUN=mean)
> #thre by-variable work fine:
> aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)
> #two by-variables do not produce the levels of the second by-variable
> correctly:
> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)
> # data
> print(data)
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Result of the R code:
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>> # demonstration of the aggregate bug ind R 2.10.1
>> factor.a=rep(letters[1:3],4)
>> factor.b=rep(letters[4:5],each=3,times=2)
>> factor.c=rep(letters[4:5+2],each=6)
>> data=data.frame(factor.a,factor.b,factor.c,x)
>> x=1:12
>> #one by-variable works:
>> aggregate(x,list(a=factor.a),FUN=mean)
> a x
> 1 a 5.5
> 2 b 6.5
> 3 c 7.5
>> #thre by-variable work fine:
>> aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)
> a b c x
> 1 a d d 4
> 2 b d d 5
> 3 c d d 6
> 4 a e e 7
> 5 b e e 8
> 6 c e e 9
>> #two by-variables do not produce the levels of the second by-variable
> correctly:
>> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)
> a b x
> 1 a d 4
> 2 b d 5
> 3 c d 6
> 4 a d 7
> 5 b d 8
> 6 c d 9
> Warnmeldung:
> In data.frame(w, lapply(y, unlist, use.names = FALSE), stringsAsFactors = FALSE)
> :
> row names were found from a short variable and have been discarded
>> # data
>> print(data)
> factor.a factor.b factor.c x
> 1 a d f 1
> 2 b d f 2
> 3 c d f 3
> 4 a e f 4
> 5 b e f 5
> 6 c e f 6
> 7 a d g 7
> 8 b d g 8
> 9 c d g 9
> 10 a e g 10
> 11 b e g 11
> 12 c e g 12
>
I don't see this is 2.10.1 nor in 2.11.0 (Windows Vista).
I can't think of how you might have got your result.
Is there something you haven't mentioned?
What's your sessionInfo()?
--
Peter Ehlers
University of Calgary
More information about the R-devel
mailing list