[R] Aggregate behaviour inconsistent (?) when FUN=table

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Feb 6 16:33:45 CET 2018


The normal input to a factory that builds cars is car parts. Feeding whole trucks into such a factory is likely to yield odd-looking results.

Both aggregate and table do similar kinds of things, but yield differently constructed outputs. The output of the table function is not well-suited to be used as the aggregated value to be compiled into a data frame by the aggregate function, so having aggregate call the table function will yield surprises.

I am having some difficulty deciphering what it is you are trying to accomplish with all this, so I will guess that you are trying to reproduce the information output from

table( df$C, df$B )

so

aggregate( df$A, df[ , c( "C", "B" ) ], length )

but if that isn't what you want then perhaps you can clarify what result you want to see and we can help you get there. 
-- 
Sent from my phone. Please excuse my brevity.

On February 6, 2018 12:20:03 AM PST, Alain Guillet <alain.guillet at uclouvain.be> wrote:
>Dear R users,
>
>When I use aggregate with table as FUN, I get what I would call a 
>strange behaviour if it involves numerical vectors and one "level" of
>it 
>is not present for every "levels" of the "by" variable:
>
>---------------------------
>
> > df <- 
>data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1    B
>1       0   1   2    3
>2       1   3   2 2, 3
>
> > table(df$C,df$B)
>
>     0 1
>   0 3 0
>   1 2 3
>
>---------------
>
>As you can see, a comma appears in the column with the variable B in
>the 
>aggregate whereas when I call table I obtain the same result as if B
>was 
>defined as a factor (I suppose it comes from the fact "non-factor 
>arguments a are coerced via factor" according to the details of the 
>table help). I find it completely normal if I remember that aggregate 
>first splits the data into subsets and then compute the table. But then
>
>I don't understand why it works differently with character vectors. 
>Indeed if I use character vectors, I get the same result as with
>factors:
>
>------------------------
>
> > df <- 
>data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
>1       0   1   2   3   0
>2       1   3   2   2   3
>
> > df <- 
>data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
>1       0   1   2   3   0
>2       1   3   2   2   3
>
>---------------------
>
>Is it possible to precise anything about this behaviour in the
>aggregate 
>help since the result is not completely compatible with the expectation
>
>of result we can have according to the table help? Or would it be 
>possible to have the same results independently of the vector type?
>This 
>post was rejected on the R-devel mailing list so I ask my question here
>
>as suggested.
>
>
>Best regards,
>Alain Guillet



More information about the R-help mailing list