[R] ddply with mean and max...

Dennis Murphy djmuser at gmail.com
Wed May 11 19:50:47 CEST 2011


Hi:

Try this:

test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100))
str(test.set)
'data.frame':   100 obs. of  3 variables:
 $ site: int  1 2 3 4 5 6 7 8 9 10 ...
 $ x   : int  403 10 -74327032 10380982 -951011855 1368411171
-390937486 -1081698620 -812257145 -1354214307 ...
 $ y   : num  -0.414 -0.851 -1.67 -0.315 1.934 ...

# It's easier to use numcolwise() if the grouping variables are not numeric,
# so change site to be a factor variable:
> test.set$site <- factor(test.set$site)
> ddply(test.set, .(site), numcolwise(mean))
   site          x           y
1     1 -207083133 -0.01895802
2     2  321488067  0.19581351
3     3   46121295 -0.41734140
4     4  321915795 -0.08254519
5     5 -416497845 -0.10543154
6     6  -27745056 -0.38855565
7     7  515863199 -0.54731714
8     8  412917654  0.05438913
9     9 -327132515  0.26896930
10   10   74689545 -0.45381880
> ddply(test.set, .(site), numcolwise(max))
   site          x         y
1     1 1997725565 0.8473888
2     2 2018830674 1.6600380
3     3 1909893732 2.4445523
4     4 1365543339 1.3697428
5     5 1688291226 2.2145275
6     6 1368411171 1.5141589
7     7 1974894876 1.2868469
8     8 2054615743 0.7917823
9     9 1091060578 2.4678820
10   10 2055409475 2.4488190
> ddply(test.set, .(site), numcolwise(min))

<snipped - same idea>

I imagine you'd want to put all this together, so an easier way in
ddply() is to create a function that reads a data frame and outputs a
data frame, as follows:

f <- function(d) data.frame(mean.x = mean(d$x), mean.y = mean(d$y),
                            min.x = min(d$x), min.y = min(d$y),
                            max.x = max(d$x), max.y = max(d$y))
ddply(test.set, .(site), f)

In this case, aggregate() would be a little bit simpler (R-2.11.0 +):

aggregate(cbind(x, y) ~ site, data = test.set,
          FUN = function(x) c(mean = mean(x), min = min(x), max = max(x)))

On Wed, May 11, 2011 at 9:46 AM, Justin <jtor14 at gmail.com> wrote:
> I'm trying to use ddply to compute summary statistics for many variables
> splitting on the variable site.  however, it seems to work fine for mean() but
> if i use max() or min() things fall apart.  whats going on?
>

The problem in your code is that you don't specify to what the
mean/min/max is supposed to refer.

HTH,
Dennis

>  test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100))
>  means<-ddply(test.set,.(site),mean)
>  means
>   site          x           y
> 1     1  -97459496 -0.14826303
> 2     2 -150246922 -0.29279556
> 3     3  471813178  0.13090210
> 4     4 -655451621  0.07908207
> 5     5 -229505843  0.10239588
> 6     6 -667025397 -0.34930275
> 7     7  510041943  0.20547460
> 8     8  270993292 -0.63658199
> 9     9  264989314  0.09695455
> 10   10 -199965142 -0.07202699
>  maxes<-ddply(test.set,.(site),max)
>  maxes
>   site         V1
> 1     1 1942437227
> 2     2 2066224792
> 3     3 2146619846
> 4     4 1381954134
> 5     5 1802867123
> 6     6 1786627153
> 7     7 1951106534
> 8     8 1498358582
> 9     9 2022046126
> 10   10 1670904926
>
> Can you all shed some light on this? I'm stumped!
>
> Thanks,
> Justin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list