[R] how to create data.frames from vectors with duplicates
William Dunlap
wdunlap at tibco.com
Wed Aug 31 19:40:19 CEST 2011
I'll put in a plug for vapply().
> # 100,000 numbers in 17576 groups:
> y <- rep(do.call(paste, c(list(sep=""), expand.grid(LETTERS,letters,letters))), length=1e5)
> x <- seq_along(y)^2
> system.time(val.vapply <- vapply(split(x, y), FUN=sum, FUN.VALUE=0))
user system elapsed
0.18 0.02 0.20
> system.time(val.rowsum <- rowsum(x, y))
user system elapsed
0.14 0.00 0.15
> system.time(val.tapply <- tapply(x, y, sum))
user system elapsed
0.40 0.00 0.41
> all(val.vapply==val.rowsum)
[1] TRUE
> all(val.vapply==val.tapply)
[1] TRUE
S+ has fast functions groupSums, groupProds, etc. (one for
each of the standard summary functions) to deal with this
sort of thing.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter
> Sent: Wednesday, August 31, 2011 10:10 AM
> To: Henrique Dallazuanna
> Cc: r-help; zhenjiang xu
> Subject: Re: [R] how to create data.frames from vectors with duplicates
>
> For the record, Henrique's use of rowsum() is about 10 times faster
> than using tapply (and presumably anything with table() ) on my
> computer. It call a C primitive.
>
> -- Bert
>
> On Wed, Aug 31, 2011 at 9:55 AM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> > Try this:
> >
> > rowsum(x, y)
> >
> > On Wed, Aug 31, 2011 at 1:45 PM, zhenjiang xu <zhenjiang.xu at gmail.com> wrote:
> >>
> >> Hi R users,
> >>
> >> suppose I have two vectors,
> >> > x=c(1,2,3,4,5)
> >> > y=c('a','b','c','a','c')
> >> How can I get a data.frame like this?
> >> > xy
> >> count
> >> a 5
> >> b 2
> >> c 8
> >>
> >> I know a few ways to fulfill the task. However, I have a huge number
> >> of this kind calculations, so I'd like an efficient solution. Thanks
> >>
> >> --
> >> Best,
> >> Zhenjiang
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list