[R] different interface to by (tapply)?

William Dunlap wdunlap at tibco.com
Mon Aug 30 16:56:00 CEST 2010


Have you tried aggregate or plyr's ddply?
by() is meant for functions that return such
complicated return values that automatically combining
them is not feasible (e.g., lm()).  aggregate()
works for functions that return scalars or
simple vectors and returns a data.frame.
ddply is part of a family of apply functions
with a uniform interface.

I didn't notice any sample data so I made
up some and your by() call didn't work with
what I made up, so perhaps you need something
else.

> indf<-data.frame(charid=c("A","A","A","B","A","B"), x= 11:16)
> by(indf$x, indf$charid, function(x)c(m=mean(x),s=sd(x)))
indf$charid: A
        m         s 
12.750000  1.707825 
------------------------------------------------------------ 
indf$charid: B
        m         s 
15.000000  1.414214 
> ddply(indf, .variables=.(charid), .fun=function(df)c(m=mean(df$x),s=sd(df$x)))
  charid     m        s
1      A 12.75 1.707825
2      B 15.00 1.414214
> str(.Last.value)
'data.frame':   2 obs. of  3 variables:
 $ charid: Factor w/ 2 levels "A","B": 1 2
 $ m     : num  12.8 15
 $ s     : num  1.71 1.41

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of ivo welch
> Sent: Monday, August 30, 2010 6:19 AM
> To: r-help
> Subject: [R] different interface to by (tapply)?
> 
> dear R experts:
> 
> has someone written a function that returns the results of by() as a
> data frame?   of course, this can work only if the output of the
> function that is an argument to by() is a numerical vector.
> presumably, what is now names(byobject) would become a column in the
> data frame, and the by object's list elements would become columns.
> it's a little bit like flattening the by() output object (so that the
> name of the list item and its contents become the same row), and
> having the right names for the columns.  I don't know how to do this
> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
> the list of vectors, is easy, but would not make a nice function for
> me to use often.)
> 
> for example, lets say my by() output is currently
> 
> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
> 
> $`A`
> [1] 2 3
> $`B`
> [2] 4 5
> 
> then the revised by() would instead produce
> 
> charid  m  s
> A          2  3
> B          4  5
> 
> working with data frames is often more intuitive than working with the
> output of by().  the R wizards are probably chuckling now about how
> easy this is...
> 
> regards,
> 
> /iaw
> 
> ----
> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list