[Rd] Varying as.Date performance
Gabor Grothendieck
ggrothendieck at gmail.com
Thu May 5 07:02:33 CEST 2005
On 5/5/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> On 5/4/05, Jeff Enos <jeff at kanecap.com> wrote:
> > R-devel,
> >
> > The performance of as.Date differs by a large degree between one of my
> > machines with glibc 2.3.2:
> >
> > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> > [1] 1.17 0.00 1.18 0.00 0.00
> >
> > and a comparable machine with glibc 2.3.3:
> >
> > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> > [1] 31.20 46.89 81.01 0.00 0.00
> >
> > both with the same R version:
> >
> > > R.version
> > _
> > platform i686-pc-linux-gnu
> > arch i686
> > os linux-gnu
> > system i686, linux-gnu
> > status
> > major 2
> > minor 1.0
> > year 2005
> > month 04
> > day 18
> > language R
> >
> > I'm focusing on differences in glibc versions because of as.Date's use
> > of strptime.
> >
> > Does it seem likely that the cause of this discrepancy is in fact
> > glibc? If so, can anyone tell me how to make the performance of the
> > second machine more like the first?
> >
> > I have verified that using the chron package, which I don't believe
> > uses strptime, for the above character conversion performs equally
> > well on both machines.
>
> I think its likely the character processing that is the bottleneck. You
> can speed that part up by extracting the substrings directly:
>
> > system.time({
> + dd <- rep("01-01-2005", 10000)
> + year <- as.numeric(substr(dd, 7, 10))
> + mon <- as.numeric(substr(dd, 1, 2))
> + day <- as.numeric(substr(dd, 4, 5))
> + x <- as.Date(ISOdate(year, mon, day))
> + }, gc = TRUE)
> [1] 0.42 0.00 0.51 NA NA
>
> > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"), gc=TRUE)
> [1] 1.08 0.00 1.22 NA NA
>
Sorry but I got the number of zeros in the reps wrong. Its actually slower.
More information about the R-devel
mailing list