[R] different interface to by (tapply)?
ivo welch
ivo.welch at gmail.com
Mon Aug 30 16:50:15 CEST 2010
perfect. this is the R way to do it quick and easy. thank you, marc.
(PS, in my earlier example, what I wanted was aggregate( . ~ key,
data=indf, FUN = function(x) c(m=mean(x), s=sd(x))) )
----
Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
On Mon, Aug 30, 2010 at 10:47 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
>
> FYI, since R version 2.11.0, aggregate() can return a vector of summary results, rather than just a scalar:
>
>> aggregate(iris$Sepal.Length, list(Species = iris$Species),
> function(x) c(Mean = mean(x), SD = sd(x)))
> Species x.Mean x.SD
> 1 setosa 5.0060000 0.3524897
> 2 versicolor 5.9360000 0.5161711
> 3 virginica 6.5880000 0.6358796
>
>
> There is also now a formula interface:
>
>> aggregate(. ~ Species, data = iris,
> FUN = function(x) c(Mean = mean(x), SD = sd(x)))
> Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean
> 1 setosa 5.0060000 0.3524897 3.4280000
> 2 versicolor 5.9360000 0.5161711 2.7700000
> 3 virginica 6.5880000 0.6358796 2.9740000
> Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
> 1 0.3790644 1.4620000 0.1736640 0.2460000
> 2 0.3137983 4.2600000 0.4699110 1.3260000
> 3 0.3224966 5.5520000 0.5518947 2.0260000
> Petal.Width.SD
> 1 0.1053856
> 2 0.1977527
> 3 0.2746501
>
>
> HTH,
>
> Marc Schwartz
>
>
> On Aug 30, 2010, at 8:36 AM, Henrique Dallazuanna wrote:
>
>> Try this:
>>
>> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
>>
>> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch <ivo.welch at gmail.com> wrote:
>>
>>> dear R experts:
>>>
>>> has someone written a function that returns the results of by() as a
>>> data frame? of course, this can work only if the output of the
>>> function that is an argument to by() is a numerical vector.
>>> presumably, what is now names(byobject) would become a column in the
>>> data frame, and the by object's list elements would become columns.
>>> it's a little bit like flattening the by() output object (so that the
>>> name of the list item and its contents become the same row), and
>>> having the right names for the columns. I don't know how to do this
>>> quickly in the R way. (Doing it slowly, e.g., with a for loop over
>>> the list of vectors, is easy, but would not make a nice function for
>>> me to use often.)
>>>
>>> for example, lets say my by() output is currently
>>>
>>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>>>
>>> $`A`
>>> [1] 2 3
>>> $`B`
>>> [2] 4 5
>>>
>>> then the revised by() would instead produce
>>>
>>> charid m s
>>> A 2 3
>>> B 4 5
>>>
>>> working with data frames is often more intuitive than working with the
>>> output of by(). the R wizards are probably chuckling now about how
>>> easy this is...
>>>
>>> regards,
>>>
>>> /iaw
>>>
>>> ----
>>> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
>
>
More information about the R-help
mailing list