[R] tapply
Marc Schwartz
MSchwartz at mn.rr.com
Tue Jun 21 01:46:58 CEST 2005
On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:
> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391 2600
> 5541 389100307391 2600
> 5542 389100307391 2600
> 5543 389100307391 2600
> 5544 389100307391 2600
> 5546 381300302513 NA
> 5547 387000307470 NA
> 5548 387000307470 NA
> 5549 387000307470 NA
> 5550 387000307470 NA
> 5551 387000307470 NA
> 5552 387000307470 NA
>
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
>
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
>
> please help.
The index vector(s) in tapply() need to be a "list". See the description
of the INDEX argument in ?tapply:
> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)
381300302513 387000307470 389100307391
0 0 13000
Note that the use of na.rm = TRUE here results in misleading values of 0
for the other two groups, which are all NA's and this is not
self-evident unless you know the data.
You may be better off with:
> tapply(z[[3]],list(z[[2]]), sum)
381300302513 387000307470 389100307391
NA NA 13000
unless your real data is a mix of NA's and measured values.
Also see ?complete.cases and ?na.omit for further approaches to dealing
with such data sets.
HTH,
Marc Schwartz
More information about the R-help
mailing list