[R] how to create data.frames from vectors with duplicates

William Dunlap wdunlap at tibco.com
Wed Aug 31 19:40:19 CEST 2011


I'll put in a plug for vapply().

  > # 100,000 numbers in 17576 groups:
  > y <- rep(do.call(paste, c(list(sep=""), expand.grid(LETTERS,letters,letters))), length=1e5)
  > x <- seq_along(y)^2
  > system.time(val.vapply <- vapply(split(x, y), FUN=sum, FUN.VALUE=0))
     user  system elapsed 
     0.18    0.02    0.20 
  > system.time(val.rowsum <- rowsum(x, y))
     user  system elapsed 
     0.14    0.00    0.15 
  > system.time(val.tapply <- tapply(x, y, sum))
     user  system elapsed 
     0.40    0.00    0.41 
  > all(val.vapply==val.rowsum)
  [1] TRUE
  > all(val.vapply==val.tapply)
  [1] TRUE

S+ has fast functions groupSums, groupProds, etc. (one for
each of the standard summary functions) to deal with this
sort of thing.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter
> Sent: Wednesday, August 31, 2011 10:10 AM
> To: Henrique Dallazuanna
> Cc: r-help; zhenjiang xu
> Subject: Re: [R] how to create data.frames from vectors with duplicates
> 
> For the record, Henrique's use of rowsum() is about 10 times faster
> than using tapply (and presumably anything with table() ) on my
> computer.  It call a C primitive.
> 
> -- Bert
> 
> On Wed, Aug 31, 2011 at 9:55 AM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> > Try this:
> >
> > rowsum(x, y)
> >
> > On Wed, Aug 31, 2011 at 1:45 PM, zhenjiang xu <zhenjiang.xu at gmail.com> wrote:
> >>
> >> Hi R users,
> >>
> >> suppose I have two vectors,
> >>  > x=c(1,2,3,4,5)
> >>  > y=c('a','b','c','a','c')
> >> How can I get a data.frame like this?
> >> > xy
> >>      count
> >> a     5
> >> b     2
> >> c     8
> >>
> >> I know a few ways to fulfill the task. However, I have a huge number
> >> of this kind calculations, so I'd like an efficient solution. Thanks
> >>
> >> --
> >> Best,
> >> Zhenjiang
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list