[R] different interface to by (tapply)?
Marc Schwartz
marc_schwartz at me.com
Mon Aug 30 16:47:24 CEST 2010
FYI, since R version 2.11.0, aggregate() can return a vector of summary results, rather than just a scalar:
> aggregate(iris$Sepal.Length, list(Species = iris$Species),
function(x) c(Mean = mean(x), SD = sd(x)))
Species x.Mean x.SD
1 setosa 5.0060000 0.3524897
2 versicolor 5.9360000 0.5161711
3 virginica 6.5880000 0.6358796
There is also now a formula interface:
> aggregate(. ~ Species, data = iris,
FUN = function(x) c(Mean = mean(x), SD = sd(x)))
Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean
1 setosa 5.0060000 0.3524897 3.4280000
2 versicolor 5.9360000 0.5161711 2.7700000
3 virginica 6.5880000 0.6358796 2.9740000
Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
1 0.3790644 1.4620000 0.1736640 0.2460000
2 0.3137983 4.2600000 0.4699110 1.3260000
3 0.3224966 5.5520000 0.5518947 2.0260000
Petal.Width.SD
1 0.1053856
2 0.1977527
3 0.2746501
HTH,
Marc Schwartz
On Aug 30, 2010, at 8:36 AM, Henrique Dallazuanna wrote:
> Try this:
>
> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
>
> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch <ivo.welch at gmail.com> wrote:
>
>> dear R experts:
>>
>> has someone written a function that returns the results of by() as a
>> data frame? of course, this can work only if the output of the
>> function that is an argument to by() is a numerical vector.
>> presumably, what is now names(byobject) would become a column in the
>> data frame, and the by object's list elements would become columns.
>> it's a little bit like flattening the by() output object (so that the
>> name of the list item and its contents become the same row), and
>> having the right names for the columns. I don't know how to do this
>> quickly in the R way. (Doing it slowly, e.g., with a for loop over
>> the list of vectors, is easy, but would not make a nice function for
>> me to use often.)
>>
>> for example, lets say my by() output is currently
>>
>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>>
>> $`A`
>> [1] 2 3
>> $`B`
>> [2] 4 5
>>
>> then the revised by() would instead produce
>>
>> charid m s
>> A 2 3
>> B 4 5
>>
>> working with data frames is often more intuitive than working with the
>> output of by(). the R wizards are probably chuckling now about how
>> easy this is...
>>
>> regards,
>>
>> /iaw
>>
>> ----
>> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
More information about the R-help
mailing list