[R] Aggregate behaviour inconsistent (?) when FUN=table
Alain Guillet
alain.guillet at uclouvain.be
Tue Feb 6 18:17:08 CET 2018
Thank you for your response. Note that with R 3.4.3, I get the same
result with simplify=TRUE or simplify=FALSE.
My problem was the behaviour was different if I define my columns as
character or as numeric but for now some minutes I discovered there also
is a stringsAsFactors option in the function data.frame. So yes, it was
a stupid question and I apologize for it.
On 06/02/2018 18:07, William Dunlap wrote:
> Don't use aggregate's simplify=TRUE when FUN() produces return
> values of various dimensions. In your case, the shape of table(subset)'s
> return value depends on the number of levels in the factor 'subset'.
> If you make B a factor before splitting it by C, each split will have the
> same number of levels (2). If you split it and then let table convert
> each split to a factor, one split will have 1 level and the other 2.
> To see
> the details of the output , use str() instead of print().
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet
> <alain.guillet at uclouvain.be <mailto:alain.guillet at uclouvain.be>> wrote:
>
> Dear R users,
>
> When I use aggregate with table as FUN, I get what I would call a
> strange behaviour if it involves numerical vectors and one "level"
> of it is not present for every "levels" of the "by" variable:
>
> ---------------------------
>
> > df <-
> data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
> Group.1 A.0 A.1 B
> 1 0 1 2 3
> 2 1 3 2 2, 3
>
> > table(df$C,df$B)
>
> 0 1
> 0 3 0
> 1 2 3
>
> ---------------
>
> As you can see, a comma appears in the column with the variable B
> in the aggregate whereas when I call table I obtain the same
> result as if B was defined as a factor (I suppose it comes from
> the fact "non-factor arguments a are coerced via factor" according
> to the details of the table help). I find it completely normal if
> I remember that aggregate first splits the data into subsets and
> then compute the table. But then I don't understand why it works
> differently with character vectors. Indeed if I use character
> vectors, I get the same result as with factors:
>
> ------------------------
>
> > df <-
> data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
> Group.1 A.0 A.1 B.0 B.1
> 1 0 1 2 3 0
> 2 1 3 2 2 3
>
> > df <-
> data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
> Group.1 A.0 A.1 B.0 B.1
> 1 0 1 2 3 0
> 2 1 3 2 2 3
>
> ---------------------
>
> Is it possible to precise anything about this behaviour in the
> aggregate help since the result is not completely compatible with
> the expectation of result we can have according to the table help?
> Or would it be possible to have the same results independently of
> the vector type? This post was rejected on the R-devel mailing
> list so I ask my question here as suggested.
>
>
> Best regards,
> Alain Guillet
>
> --
>
More information about the R-help
mailing list