[R] Calculating subsets "on the fly" with ddply
hadley wickham
h.wickham at gmail.com
Thu Feb 4 04:44:49 CET 2010
> The ddply invocation would look like so:
>
> R> my <- ddply(iris, .(w=Sepal.Length < 5.5, Species), transform,
> grmean=mean(Petal.Width))
> R> head(my)
> w Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> grmean
> 1 FALSE 5.8 4.0 1.2 0.2 setosa
> 0.260000
> 2 FALSE 5.7 4.4 1.5 0.4 setosa
> 0.260000
> 3 FALSE 5.7 3.8 1.7 0.3 setosa
> 0.260000
> 4 FALSE 5.5 4.2 1.4 0.2 setosa
> 0.260000
> 5 FALSE 5.5 3.5 1.3 0.2 setosa
> 0.260000
> 6 FALSE 7.0 3.2 4.7 1.4 versicolor
> 1.347727
>
>
> Although this appears to work, I'm not sure if the .(w= ...) is
> correct. Is that how it should be done?
Yes, that's a deliberate design feature.
> [Not relevant to this question, but I believe I need the w column for
> a successful downstream call to ggplot, but anyway ...]
>
> Now, I want 5.5 to be passed in "on the fly"... and this works:
>
> R> val <- 5.5
> R> my.2 <- ddply(iris, .(w=Sepal.Length < val, Species), transform,
> grmean=mean(Petal.Width))
> R> identical(my.2, my)
> [1] TRUE
>
> But what I really want is this to be part of some function that lets
> me pick any value for `val` ... this doesn't work:
>
> my.function <- function(df, my.val) {
> ddply(df, .(w=Sepal.Length < my.val, Species), transform, grmean=mean
> (Petal.Width))
> }
>
> R> my.function(iris, 5.5)
> Error in eval(expr, envir, enclos) : object 'my.val' not found
>
> I can work around this by editing `df` in my function to add a w
> column first:
>
> my.function2 <- function(df, my.val) {
> df$w <- df$Sepal.Length < my.val
> ddply(df, .(w, Species), transform, grmean=mean(Petal.Width))
> }
>
> R> my2 <- my.function2(iris, 5.5)
> R> head(my2)
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species w
> grmean
> 1 5.8 4.0 1.2 0.2 setosa FALSE
> 0.260000
> 2 5.7 4.4 1.5 0.4 setosa FALSE
> 0.260000
> 3 5.7 3.8 1.7 0.3 setosa FALSE
> 0.260000
> 4 5.5 4.2 1.4 0.2 setosa FALSE
> 0.260000
> 5 5.5 3.5 1.3 0.2 setosa FALSE
> 0.260000
> 6 7.0 3.2 4.7 1.4 versicolor FALSE
> 1.347727
>
> Is that the "right" way to do it?
That's what I'd recommend.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list