[R] Aggregate behaviour inconsistent (?) when FUN=table

Alain Guillet alain.guillet at uclouvain.be
Tue Feb 6 09:20:03 CET 2018


Dear R users,

When I use aggregate with table as FUN, I get what I would call a 
strange behaviour if it involves numerical vectors and one "level" of it 
is not present for every "levels" of the "by" variable:

---------------------------

 > df <- 
data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1    B
1       0   1   2    3
2       1   3   2 2, 3

 > table(df$C,df$B)

     0 1
   0 3 0
   1 2 3

---------------

As you can see, a comma appears in the column with the variable B in the 
aggregate whereas when I call table I obtain the same result as if B was 
defined as a factor (I suppose it comes from the fact "non-factor 
arguments a are coerced via factor" according to the details of the 
table help). I find it completely normal if I remember that aggregate 
first splits the data into subsets and then compute the table. But then 
I don't understand why it works differently with character vectors. 
Indeed if I use character vectors, I get the same result as with factors:

------------------------

 > df <- 
data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1 B.0 B.1
1       0   1   2   3   0
2       1   3   2   2   3

 > df <- 
data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1 B.0 B.1
1       0   1   2   3   0
2       1   3   2   2   3

---------------------

Is it possible to precise anything about this behaviour in the aggregate 
help since the result is not completely compatible with the expectation 
of result we can have according to the table help? Or would it be 
possible to have the same results independently of the vector type? This 
post was rejected on the R-devel mailing list so I ask my question here 
as suggested.


Best regards,
Alain Guillet

-- 
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
http://www.uclouvain.be/smcs

Bureau c.316
Voie du Roman Pays, 20 (bte L1.04.01)
B-1348 Louvain-la-Neuve
Belgium

Tel: +32 10 47 30 50

Accès: http://www.uclouvain.be/323631.html



More information about the R-help mailing list