[R] How to force aggregate to exclude NA ?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Dec 7 16:06:22 CET 2008


Actually the second aggregate and second rowsum don't need the na.rm = TRUE
so we only need:

aggregate(!is.na(m[, -(1:2)]), m[1], sum)
rowsum(0+!is.na(m[, -(1:2)]), m[,1])

You might also want to look at summaryBy in the doBy package.

On Sun, Dec 7, 2008 at 7:43 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Try
>
> aggregate(m[, -(1:2)], m[1], sum, na.rm = TRUE)
> aggregate(!is.na(m[, -(1:2)]), m[1], sum, na.rm = TRUE)
>
> # or (this uses row names rather than a column for the group):
>
> rowsum(m[, -(1:2)], m[,1], na.rm = TRUE)
> rowsum(0+!is.na(m[, -(1:2)]), m[,1], na.rm = TRUE)
>
>
> On Sun, Dec 7, 2008 at 7:06 AM, Daren Tan <daren76 at hotmail.com> wrote:
>>
>> The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle.
>>
>>> set.seed(143)
>>> m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T))
>>> m
>>   A B  C  D
>> 1  E I  1 NA
>> 2  A C NA NA
>> 3  D I NA  3
>> 4  C I  2  4
>> 5  A C  3  2
>> 6  E J  1  2
>> 7  D J  2  2
>> 8  C G  4  1
>> 9  C D NA  3
>> 10 B G  3 NA
>> 11 C B  4  2
>> 12 A B NA NA
>> 13 E A NA  4
>> 14 B B  3  3
>> 15 E I  4  1
>> 16 E J  3  1
>> 17 B J  4  4
>> 18 B J  1  3
>> 19 D D  4  2
>> 20 B B  4  3
>>
>>> aggregate(m[,-c(1:2)], by=list(m[,1]), sum)
>>  Group.1  C  D
>> 1       A NA NA
>> 2       B 15 NA
>> 3       C NA 10
>> 4       D NA  7
>> 5       E NA NA
>>
>>> aggregate(m[,-c(1:2)], by=list(m[,1]), length)
>>  Group.1 C D
>> 1       A 3 3
>> 2       B 5 5
>> 3       C 4 4
>> 4       D 3 3
>> 5       E 5 5
>>
>> My own defined version of length and sum to exclude NA
>>
>>> mylength <- function(x) {  sum(as.logical(x), na.rm=T) }
>>> mysum <- function(x) {sum(x, na.rm=T)}
>>
>>> aggregate(m[,-c(1:2)], by=list(m[,1]), mysum)   <----------------- this computes correctly.
>>  Group.1  C  D
>> 1       A  3  2
>> 2       B 15 13
>> 3       C 10 10
>> 4       D  6  7
>> 5       E  9  8
>>
>>> aggregate(m[,-c(1:2)], by=list(m[,1]), mylength) <----------------- this computes correctly.
>>  Group.1 C D
>> 1       A 1 1
>> 2       B 5 4
>> 3       C 3 4
>> 4       D 2 3
>> 5       E 4 4
>>
>> There are other statistics I need to compute e.g. var, sd, and it is a hassle to create customized versions to exclude NA. Any alternative approaches ?
>>
>>
>>
>>
>> _________________________________________________________________
>> [[elided Hotmail spam]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list