[R] Aggregate behaviour inconsistent (?) when FUN=table
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Tue Feb 6 16:33:45 CET 2018
The normal input to a factory that builds cars is car parts. Feeding whole trucks into such a factory is likely to yield odd-looking results.
Both aggregate and table do similar kinds of things, but yield differently constructed outputs. The output of the table function is not well-suited to be used as the aggregated value to be compiled into a data frame by the aggregate function, so having aggregate call the table function will yield surprises.
I am having some difficulty deciphering what it is you are trying to accomplish with all this, so I will guess that you are trying to reproduce the information output from
table( df$C, df$B )
so
aggregate( df$A, df[ , c( "C", "B" ) ], length )
but if that isn't what you want then perhaps you can clarify what result you want to see and we can help you get there.
--
Sent from my phone. Please excuse my brevity.
On February 6, 2018 12:20:03 AM PST, Alain Guillet <alain.guillet at uclouvain.be> wrote:
>Dear R users,
>
>When I use aggregate with table as FUN, I get what I would call a
>strange behaviour if it involves numerical vectors and one "level" of
>it
>is not present for every "levels" of the "by" variable:
>
>---------------------------
>
> > df <-
>data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
> Group.1 A.0 A.1 B
>1 0 1 2 3
>2 1 3 2 2, 3
>
> > table(df$C,df$B)
>
> 0 1
> 0 3 0
> 1 2 3
>
>---------------
>
>As you can see, a comma appears in the column with the variable B in
>the
>aggregate whereas when I call table I obtain the same result as if B
>was
>defined as a factor (I suppose it comes from the fact "non-factor
>arguments a are coerced via factor" according to the details of the
>table help). I find it completely normal if I remember that aggregate
>first splits the data into subsets and then compute the table. But then
>
>I don't understand why it works differently with character vectors.
>Indeed if I use character vectors, I get the same result as with
>factors:
>
>------------------------
>
> > df <-
>data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
> Group.1 A.0 A.1 B.0 B.1
>1 0 1 2 3 0
>2 1 3 2 2 3
>
> > df <-
>data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
> Group.1 A.0 A.1 B.0 B.1
>1 0 1 2 3 0
>2 1 3 2 2 3
>
>---------------------
>
>Is it possible to precise anything about this behaviour in the
>aggregate
>help since the result is not completely compatible with the expectation
>
>of result we can have according to the table help? Or would it be
>possible to have the same results independently of the vector type?
>This
>post was rejected on the R-devel mailing list so I ask my question here
>
>as suggested.
>
>
>Best regards,
>Alain Guillet
More information about the R-help
mailing list