[R] Aggregate behaviour inconsistent (?) when FUN=table
    Alain Guillet 
    alain.guillet at uclouvain.be
       
    Tue Feb  6 09:20:03 CET 2018
    
    
  
Dear R users,
When I use aggregate with table as FUN, I get what I would call a 
strange behaviour if it involves numerical vectors and one "level" of it 
is not present for every "levels" of the "by" variable:
---------------------------
 > df <- 
data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1    B
1       0   1   2    3
2       1   3   2 2, 3
 > table(df$C,df$B)
     0 1
   0 3 0
   1 2 3
---------------
As you can see, a comma appears in the column with the variable B in the 
aggregate whereas when I call table I obtain the same result as if B was 
defined as a factor (I suppose it comes from the fact "non-factor 
arguments a are coerced via factor" according to the details of the 
table help). I find it completely normal if I remember that aggregate 
first splits the data into subsets and then compute the table. But then 
I don't understand why it works differently with character vectors. 
Indeed if I use character vectors, I get the same result as with factors:
------------------------
 > df <- 
data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1 B.0 B.1
1       0   1   2   3   0
2       1   3   2   2   3
 > df <- 
data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1 B.0 B.1
1       0   1   2   3   0
2       1   3   2   2   3
---------------------
Is it possible to precise anything about this behaviour in the aggregate 
help since the result is not completely compatible with the expectation 
of result we can have according to the table help? Or would it be 
possible to have the same results independently of the vector type? This 
post was rejected on the R-devel mailing list so I ask my question here 
as suggested.
Best regards,
Alain Guillet
-- 
Alain Guillet
Statistician and Computer Scientist
SMCS - IMMAQ - Université catholique de Louvain
http://www.uclouvain.be/smcs
Bureau c.316
Voie du Roman Pays, 20 (bte L1.04.01)
B-1348 Louvain-la-Neuve
Belgium
Tel: +32 10 47 30 50
Accès: http://www.uclouvain.be/323631.html
    
    
More information about the R-help
mailing list