[R] strange answer when using 'aggregate()' with a formula
Chel Hee Lee
chl948 at mail.usask.ca
Thu Jan 21 05:08:05 CET 2016
Could you kindly test the following codes? It is because I found
strange answer when 'aggregate()' is used with a formula.
I am trying to count how many missing data entries are in each group.
For this exercise, I created data as below:
> tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5))
> tmp
grp y
1 2 NA
2 3 0.5
3 2 3.0
4 3 0.5
I see that observations (variable y) can be grouped into two groups
(variable grp). For group 2, y has NA and 3.0. For group 3, y has 0.5
and 0.5. Hence, the number of missing values is 1 and 0 for group 2 and
3, respectively. This work can be done using 'aggregate()' in the
'stats' package as below:
> aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x)))
grp x
1 2 1
2 3 0
A formula can be used as below:
> aggregate(y~grp, data=tmp, function(x) sum(is.na(x)))
grp y
1 2 0
2 3 0
What a surprise! Is this a bug? I would appreciate if you share the
results after testing the codes. Thank you so much for your helps in
advance!
Chel Hee Lee
More information about the R-help
mailing list