[R] different interface to by (tapply)?

Mon Aug 30 16:47:24 CEST 2010

FYI, since R version 2.11.0, aggregate() can return a vector of summary results, rather than just a scalar:

> aggregate(iris$Sepal.Length, list(Species = iris$Species), 
            function(x) c(Mean = mean(x), SD = sd(x)))
     Species    x.Mean      x.SD
1     setosa 5.0060000 0.3524897
2 versicolor 5.9360000 0.5161711
3  virginica 6.5880000 0.6358796

There is also now a formula interface:

> aggregate(. ~ Species, data = iris, 
            FUN = function(x) c(Mean = mean(x), SD = sd(x)))
     Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean
1     setosa         5.0060000       0.3524897        3.4280000
2 versicolor         5.9360000       0.5161711        2.7700000
3  virginica         6.5880000       0.6358796        2.9740000
  Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
1      0.3790644         1.4620000       0.1736640        0.2460000
2      0.3137983         4.2600000       0.4699110        1.3260000
3      0.3224966         5.5520000       0.5518947        2.0260000
  Petal.Width.SD
1      0.1053856
2      0.1977527
3      0.2746501

HTH,

Marc Schwartz

On Aug 30, 2010, at 8:36 AM, Henrique Dallazuanna wrote:

> Try this:
> 
> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
> 
> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch <ivo.welch at gmail.com> wrote:
> 
>> dear R experts:
>> 
>> has someone written a function that returns the results of by() as a
>> data frame?   of course, this can work only if the output of the
>> function that is an argument to by() is a numerical vector.
>> presumably, what is now names(byobject) would become a column in the
>> data frame, and the by object's list elements would become columns.
>> it's a little bit like flattening the by() output object (so that the
>> name of the list item and its contents become the same row), and
>> having the right names for the columns.  I don't know how to do this
>> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
>> the list of vectors, is easy, but would not make a nice function for
>> me to use often.)
>> 
>> for example, lets say my by() output is currently
>> 
>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>> 
>> $`A`
>> [1] 2 3
>> $`B`
>> [2] 4 5
>> 
>> then the revised by() would instead produce
>> 
>> charid  m  s
>> A          2  3
>> B          4  5
>> 
>> working with data frames is often more intuitive than working with the
>> output of by().  the R wizards are probably chuckling now about how
>> easy this is...
>> 
>> regards,
>> 
>> /iaw
>> 
>> ----
>> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)