[R] ddply with mean and max...
Dennis Murphy
djmuser at gmail.com
Wed May 11 19:50:47 CEST 2011
Hi:
Try this:
test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100))
str(test.set)
'data.frame': 100 obs. of 3 variables:
$ site: int 1 2 3 4 5 6 7 8 9 10 ...
$ x : int 403 10 -74327032 10380982 -951011855 1368411171
-390937486 -1081698620 -812257145 -1354214307 ...
$ y : num -0.414 -0.851 -1.67 -0.315 1.934 ...
# It's easier to use numcolwise() if the grouping variables are not numeric,
# so change site to be a factor variable:
> test.set$site <- factor(test.set$site)
> ddply(test.set, .(site), numcolwise(mean))
site x y
1 1 -207083133 -0.01895802
2 2 321488067 0.19581351
3 3 46121295 -0.41734140
4 4 321915795 -0.08254519
5 5 -416497845 -0.10543154
6 6 -27745056 -0.38855565
7 7 515863199 -0.54731714
8 8 412917654 0.05438913
9 9 -327132515 0.26896930
10 10 74689545 -0.45381880
> ddply(test.set, .(site), numcolwise(max))
site x y
1 1 1997725565 0.8473888
2 2 2018830674 1.6600380
3 3 1909893732 2.4445523
4 4 1365543339 1.3697428
5 5 1688291226 2.2145275
6 6 1368411171 1.5141589
7 7 1974894876 1.2868469
8 8 2054615743 0.7917823
9 9 1091060578 2.4678820
10 10 2055409475 2.4488190
> ddply(test.set, .(site), numcolwise(min))
<snipped - same idea>
I imagine you'd want to put all this together, so an easier way in
ddply() is to create a function that reads a data frame and outputs a
data frame, as follows:
f <- function(d) data.frame(mean.x = mean(d$x), mean.y = mean(d$y),
min.x = min(d$x), min.y = min(d$y),
max.x = max(d$x), max.y = max(d$y))
ddply(test.set, .(site), f)
In this case, aggregate() would be a little bit simpler (R-2.11.0 +):
aggregate(cbind(x, y) ~ site, data = test.set,
FUN = function(x) c(mean = mean(x), min = min(x), max = max(x)))
On Wed, May 11, 2011 at 9:46 AM, Justin <jtor14 at gmail.com> wrote:
> I'm trying to use ddply to compute summary statistics for many variables
> splitting on the variable site. however, it seems to work fine for mean() but
> if i use max() or min() things fall apart. whats going on?
>
The problem in your code is that you don't specify to what the
mean/min/max is supposed to refer.
HTH,
Dennis
> test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100))
> means<-ddply(test.set,.(site),mean)
> means
> site x y
> 1 1 -97459496 -0.14826303
> 2 2 -150246922 -0.29279556
> 3 3 471813178 0.13090210
> 4 4 -655451621 0.07908207
> 5 5 -229505843 0.10239588
> 6 6 -667025397 -0.34930275
> 7 7 510041943 0.20547460
> 8 8 270993292 -0.63658199
> 9 9 264989314 0.09695455
> 10 10 -199965142 -0.07202699
> maxes<-ddply(test.set,.(site),max)
> maxes
> site V1
> 1 1 1942437227
> 2 2 2066224792
> 3 3 2146619846
> 4 4 1381954134
> 5 5 1802867123
> 6 6 1786627153
> 7 7 1951106534
> 8 8 1498358582
> 9 9 2022046126
> 10 10 1670904926
>
> Can you all shed some light on this? I'm stumped!
>
> Thanks,
> Justin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list