[R] Aggregate behaviour inconsistent (?) when FUN=table

Alain Guillet alain.guillet at uclouvain.be
Tue Feb 6 18:17:08 CET 2018


Thank you for your response. Note that with R 3.4.3, I get the same 
result with simplify=TRUE or simplify=FALSE.

My problem was the behaviour was different if I define my columns as 
character or as numeric but for now some minutes I discovered there also 
is a stringsAsFactors option in the function data.frame. So yes, it was 
a stupid question and I apologize for it.


On 06/02/2018 18:07, William Dunlap wrote:
> Don't use aggregate's simplify=TRUE when FUN() produces return
> values of various dimensions.  In your case, the shape of table(subset)'s
> return value depends on the number of levels in the factor 'subset'.
> If you make B a factor before splitting it by C, each split will have the
> same number of levels (2).  If you split it and then let table convert
> each split to a factor, one split will have 1 level and the other 2.  
> To see
> the details of the output , use str() instead of print().
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet 
> <alain.guillet at uclouvain.be <mailto:alain.guillet at uclouvain.be>> wrote:
>
>     Dear R users,
>
>     When I use aggregate with table as FUN, I get what I would call a
>     strange behaviour if it involves numerical vectors and one "level"
>     of it is not present for every "levels" of the "by" variable:
>
>     ---------------------------
>
>     > df <-
>     data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
>     > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>       Group.1 A.0 A.1    B
>     1       0   1   2    3
>     2       1   3   2 2, 3
>
>     > table(df$C,df$B)
>
>         0 1
>       0 3 0
>       1 2 3
>
>     ---------------
>
>     As you can see, a comma appears in the column with the variable B
>     in the aggregate whereas when I call table I obtain the same
>     result as if B was defined as a factor (I suppose it comes from
>     the fact "non-factor arguments a are coerced via factor" according
>     to the details of the table help). I find it completely normal if
>     I remember that aggregate first splits the data into subsets and
>     then compute the table. But then I don't understand why it works
>     differently with character vectors. Indeed if I use character
>     vectors, I get the same result as with factors:
>
>     ------------------------
>
>     > df <-
>     data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
>     > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>       Group.1 A.0 A.1 B.0 B.1
>     1       0   1   2   3   0
>     2       1   3   2   2   3
>
>     > df <-
>     data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
>     > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>       Group.1 A.0 A.1 B.0 B.1
>     1       0   1   2   3   0
>     2       1   3   2   2   3
>
>     ---------------------
>
>     Is it possible to precise anything about this behaviour in the
>     aggregate help since the result is not completely compatible with
>     the expectation of result we can have according to the table help?
>     Or would it be possible to have the same results independently of
>     the vector type? This post was rejected on the R-devel mailing
>     list so I ask my question here as suggested.
>
>
>     Best regards,
>     Alain Guillet
>
>     -- 
>



More information about the R-help mailing list