[Rd] Varying as.Date performance

Gabor Grothendieck ggrothendieck at gmail.com
Thu May 5 07:02:33 CEST 2005


On 5/5/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> On 5/4/05, Jeff Enos <jeff at kanecap.com> wrote:
> > R-devel,
> >
> > The performance of as.Date differs by a large degree between one of my
> > machines with glibc 2.3.2:
> >
> > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> > [1] 1.17 0.00 1.18 0.00 0.00
> >
> > and a comparable machine with glibc 2.3.3:
> >
> > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> > [1] 31.20 46.89 81.01  0.00  0.00
> >
> > both with the same R version:
> >
> > > R.version
> >         _
> > platform i686-pc-linux-gnu
> > arch     i686
> > os       linux-gnu
> > system   i686, linux-gnu
> > status
> > major    2
> > minor    1.0
> > year     2005
> > month    04
> > day      18
> > language R
> >
> > I'm focusing on differences in glibc versions because of as.Date's use
> > of strptime.
> >
> > Does it seem likely that the cause of this discrepancy is in fact
> > glibc?  If so, can anyone tell me how to make the performance of the
> > second machine more like the first?
> >
> > I have verified that using the chron package, which I don't believe
> > uses strptime, for the above character conversion performs equally
> > well on both machines.
> 
> I think its likely the character processing that is the bottleneck.  You
> can speed that part up by extracting the substrings directly:
> 
> > system.time({
> + dd <- rep("01-01-2005", 10000)
> + year <- as.numeric(substr(dd, 7, 10))
> + mon <- as.numeric(substr(dd, 1, 2))
> + day <- as.numeric(substr(dd, 4, 5))
> + x <- as.Date(ISOdate(year, mon, day))
> + }, gc = TRUE)
> [1] 0.42 0.00 0.51   NA   NA
> 
> > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"), gc=TRUE)
> [1] 1.08 0.00 1.22   NA   NA
> 

Sorry but I got the number of zeros in the reps wrong.   Its actually slower.



More information about the R-devel mailing list