[R] aggregate() function and na.rm = TRUE

Daniel Malter daniel at umd.edu
Tue Jul 8 23:59:57 CEST 2008


That may have something to do with that you have "empty" groups. In your
example, ALL Hour=0 have Y2=NA. The following example may illustrate the
point. The first 2 aggregate commands perform the function on data that
contain NAs. However, the NAs are not perfectly collinear with any level by
which you are grouping. The second example fails as your example does.

x1=rep(c(0,1),each=48)
x2=rep(c(0,1),48)
x1=c(x1,NA,NA,NA,NA)
x2=c(NA,NA,NA,NA,x2)
x3=rnorm(100,0,1)
x3=ifelse(x1==1,NA,x3) ##All x3=NA if x1=1
y=rnorm(100,0,1)
y=sort(y)

aggregate(y,by=list(x1,x2),FUN=mean)
aggregate(y,by=list(x1,x2),FUN=sd)

aggregate(list(y,x3),by=list(x1,x2),FUN=mean)
aggregate(list(y,x3),by=list(x1,x2),FUN=sd)

Best,
Daniel

-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von David Afshartous
Gesendet: Tuesday, July 08, 2008 4:57 PM
An: r-help at r-project.org
Betreff: [R] aggregate() function and na.rm = TRUE



All,

I've been using aggregate() to compute means and standard deviations at
time/treatment combinations for a longitudinal dataset, using na.rm = TRUE
for missing data. 

This was working fine before, but now when I re-run some old code it isn't.
I've backtracked my steps and can't seem to find out why it was working
before but not now.  In any event, below is a reproducible example of the
current problem, viz., calculating the standard deviation via aggregate and
employing na.rm = TRUE is not working.

Thanks,
David






dat = data.frame( Hour = c(0, 0, 0, 0, 1, 1,1, 1), Drug = factor(c("P", "D",
"P", "D", "P", "D", "P", "D")), Y1 = rnorm(8, 0),
Y2 = c(NA, NA, NA, NA, 1, 2, 3, 4) )

> aggregate(dat[c(3,4)], dat[c(1,2)], mean)
  Hour Drug          Y1 Y2
1    0    D -0.75534554 NA
2    1    D  0.27529835  3
3    0    P -0.03949923 NA
4    1    P  0.02627489  2
> aggregate(dat[c(3,4)], dat[c(1,2)], sd)
Error in var(x, na.rm = na.rm) : missing observations in cov/cor
> aggregate(dat[c(3,4)], dat[c(1,2)], sd, na.rm = TRUE)
Error in var(x, na.rm = na.rm) : no complete element pairs


> sessionInfo()
R version 2.7.1 (2008-06-23)
i386-apple-darwin8.10.1

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] grid_2.7.1     lattice_0.17-8 nlme_3.1-89
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list