[R] different interface to by (tapply)?

ivo welch ivo.welch at gmail.com
Mon Aug 30 16:50:15 CEST 2010


perfect.  this is the R way to do it quick and easy.  thank you, marc.

(PS, in my earlier example, what I wanted was aggregate( . ~ key,
data=indf, FUN = function(x) c(m=mean(x), s=sd(x)))  )

----
Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)





On Mon, Aug 30, 2010 at 10:47 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
>
> FYI, since R version 2.11.0, aggregate() can return a vector of summary results, rather than just a scalar:
>
>> aggregate(iris$Sepal.Length, list(Species = iris$Species),
>            function(x) c(Mean = mean(x), SD = sd(x)))
>     Species    x.Mean      x.SD
> 1     setosa 5.0060000 0.3524897
> 2 versicolor 5.9360000 0.5161711
> 3  virginica 6.5880000 0.6358796
>
>
> There is also now a formula interface:
>
>> aggregate(. ~ Species, data = iris,
>            FUN = function(x) c(Mean = mean(x), SD = sd(x)))
>     Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean
> 1     setosa         5.0060000       0.3524897        3.4280000
> 2 versicolor         5.9360000       0.5161711        2.7700000
> 3  virginica         6.5880000       0.6358796        2.9740000
>  Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
> 1      0.3790644         1.4620000       0.1736640        0.2460000
> 2      0.3137983         4.2600000       0.4699110        1.3260000
> 3      0.3224966         5.5520000       0.5518947        2.0260000
>  Petal.Width.SD
> 1      0.1053856
> 2      0.1977527
> 3      0.2746501
>
>
> HTH,
>
> Marc Schwartz
>
>
> On Aug 30, 2010, at 8:36 AM, Henrique Dallazuanna wrote:
>
>> Try this:
>>
>> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
>>
>> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch <ivo.welch at gmail.com> wrote:
>>
>>> dear R experts:
>>>
>>> has someone written a function that returns the results of by() as a
>>> data frame?   of course, this can work only if the output of the
>>> function that is an argument to by() is a numerical vector.
>>> presumably, what is now names(byobject) would become a column in the
>>> data frame, and the by object's list elements would become columns.
>>> it's a little bit like flattening the by() output object (so that the
>>> name of the list item and its contents become the same row), and
>>> having the right names for the columns.  I don't know how to do this
>>> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
>>> the list of vectors, is easy, but would not make a nice function for
>>> me to use often.)
>>>
>>> for example, lets say my by() output is currently
>>>
>>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>>>
>>> $`A`
>>> [1] 2 3
>>> $`B`
>>> [2] 4 5
>>>
>>> then the revised by() would instead produce
>>>
>>> charid  m  s
>>> A          2  3
>>> B          4  5
>>>
>>> working with data frames is often more intuitive than working with the
>>> output of by().  the R wizards are probably chuckling now about how
>>> easy this is...
>>>
>>> regards,
>>>
>>> /iaw
>>>
>>> ----
>>> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
>
>



More information about the R-help mailing list