[Rd] Varying as.Date performance

Gabor Grothendieck ggrothendieck at gmail.com
Thu May 5 07:00:05 CEST 2005


On 5/4/05, Jeff Enos <jeff at kanecap.com> wrote:
> R-devel,
> 
> The performance of as.Date differs by a large degree between one of my
> machines with glibc 2.3.2:
> 
> > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> [1] 1.17 0.00 1.18 0.00 0.00
> 
> and a comparable machine with glibc 2.3.3:
> 
> > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> [1] 31.20 46.89 81.01  0.00  0.00
> 
> both with the same R version:
> 
> > R.version
>         _
> platform i686-pc-linux-gnu
> arch     i686
> os       linux-gnu
> system   i686, linux-gnu
> status
> major    2
> minor    1.0
> year     2005
> month    04
> day      18
> language R
> 
> I'm focusing on differences in glibc versions because of as.Date's use
> of strptime.
> 
> Does it seem likely that the cause of this discrepancy is in fact
> glibc?  If so, can anyone tell me how to make the performance of the
> second machine more like the first?
> 
> I have verified that using the chron package, which I don't believe
> uses strptime, for the above character conversion performs equally
> well on both machines.

I think its likely the character processing that is the bottleneck.  You
can speed that part up by extracting the substrings directly:

> system.time({
+ dd <- rep("01-01-2005", 10000)
+ year <- as.numeric(substr(dd, 7, 10))
+ mon <- as.numeric(substr(dd, 1, 2))
+ day <- as.numeric(substr(dd, 4, 5))
+ x <- as.Date(ISOdate(year, mon, day))
+ }, gc = TRUE)
[1] 0.42 0.00 0.51   NA   NA

> system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"), gc=TRUE)
[1] 1.08 0.00 1.22   NA   NA



More information about the R-devel mailing list