[R] Calculating subsets "on the fly" with ddply

hadley wickham h.wickham at gmail.com
Thu Feb 4 04:44:49 CET 2010


> The ddply invocation would look like so:
>
> R> my <- ddply(iris, .(w=Sepal.Length < 5.5, Species), transform,
> grmean=mean(Petal.Width))
> R> head(my)
>      w Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
> grmean
> 1 FALSE          5.8         4.0          1.2         0.2     setosa
> 0.260000
> 2 FALSE          5.7         4.4          1.5         0.4     setosa
> 0.260000
> 3 FALSE          5.7         3.8          1.7         0.3     setosa
> 0.260000
> 4 FALSE          5.5         4.2          1.4         0.2     setosa
> 0.260000
> 5 FALSE          5.5         3.5          1.3         0.2     setosa
> 0.260000
> 6 FALSE          7.0         3.2          4.7         1.4 versicolor
> 1.347727
>
>
> Although this appears to work, I'm not sure if the .(w= ...) is
> correct. Is that how it should be done?

Yes, that's a deliberate design feature.

> [Not relevant to this question, but I believe I need the w column for
> a successful downstream call to ggplot, but anyway ...]
>
> Now, I want 5.5 to be passed in "on the fly"... and this works:
>
> R> val <- 5.5
> R> my.2 <- ddply(iris, .(w=Sepal.Length < val, Species), transform,
> grmean=mean(Petal.Width))
> R> identical(my.2, my)
> [1] TRUE
>
> But what I really want is this to be part of some function that lets
> me pick any value for `val` ... this doesn't work:
>
> my.function <- function(df, my.val) {
>  ddply(df, .(w=Sepal.Length < my.val, Species), transform, grmean=mean
> (Petal.Width))
> }
>
> R> my.function(iris, 5.5)
> Error in eval(expr, envir, enclos) : object 'my.val' not found
>
> I can work around this by editing `df` in my function to add a w
> column first:
>
> my.function2 <- function(df, my.val) {
>  df$w <- df$Sepal.Length < my.val
>  ddply(df, .(w, Species), transform, grmean=mean(Petal.Width))
> }
>
> R> my2 <- my.function2(iris, 5.5)
> R> head(my2)
>  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species     w
> grmean
> 1          5.8         4.0          1.2         0.2     setosa FALSE
> 0.260000
> 2          5.7         4.4          1.5         0.4     setosa FALSE
> 0.260000
> 3          5.7         3.8          1.7         0.3     setosa FALSE
> 0.260000
> 4          5.5         4.2          1.4         0.2     setosa FALSE
> 0.260000
> 5          5.5         3.5          1.3         0.2     setosa FALSE
> 0.260000
> 6          7.0         3.2          4.7         1.4 versicolor FALSE
> 1.347727
>
> Is that the "right" way to do it?

That's what I'd recommend.

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list