[R] How to force aggregate to exclude NA ?
Daren Tan
daren76 at hotmail.com
Sun Dec 7 13:06:29 CET 2008
The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle.
> set.seed(143)
> m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
> m
A B C D
1 E I 1 NA
2 A C NA NA
3 D I NA 3
4 C I 2 4
5 A C 3 2
6 E J 1 2
7 D J 2 2
8 C G 4 1
9 C D NA 3
10 B G 3 NA
11 C B 4 2
12 A B NA NA
13 E A NA 4
14 B B 3 3
15 E I 4 1
16 E J 3 1
17 B J 4 4
18 B J 1 3
19 D D 4 2
20 B B 4 3
> aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
Group.1 C D
1 A NA NA
2 B 15 NA
3 C NA 10
4 D NA 7
5 E NA NA
> aggregate(m[,-c(1:2)], by=list(m[,1]), length)
Group.1 C D
1 A 3 3
2 B 5 5
3 C 4 4
4 D 3 3
5 E 5 5
My own defined version of length and sum to exclude NA
> mylength <- function(x) { sum(as.logical(x), na.rm=T) }
> mysum <- function(x) {sum(x, na.rm=T)}
> aggregate(m[,-c(1:2)], by=list(m[,1]), mysum) <----------------- this computes correctly.
Group.1 C D
1 A 3 2
2 B 15 13
3 C 10 10
4 D 6 7
5 E 9 8
> aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
Group.1 C D
1 A 1 1
2 B 5 4
3 C 3 4
4 D 2 3
5 E 4 4
There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?
_________________________________________________________________
[[elided Hotmail spam]]
More information about the R-help
mailing list