[Rd] Varying as.Date performance
Gabor Grothendieck
ggrothendieck at gmail.com
Thu May 5 07:00:05 CEST 2005
On 5/4/05, Jeff Enos <jeff at kanecap.com> wrote:
> R-devel,
>
> The performance of as.Date differs by a large degree between one of my
> machines with glibc 2.3.2:
>
> > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> [1] 1.17 0.00 1.18 0.00 0.00
>
> and a comparable machine with glibc 2.3.3:
>
> > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> [1] 31.20 46.89 81.01 0.00 0.00
>
> both with the same R version:
>
> > R.version
> _
> platform i686-pc-linux-gnu
> arch i686
> os linux-gnu
> system i686, linux-gnu
> status
> major 2
> minor 1.0
> year 2005
> month 04
> day 18
> language R
>
> I'm focusing on differences in glibc versions because of as.Date's use
> of strptime.
>
> Does it seem likely that the cause of this discrepancy is in fact
> glibc? If so, can anyone tell me how to make the performance of the
> second machine more like the first?
>
> I have verified that using the chron package, which I don't believe
> uses strptime, for the above character conversion performs equally
> well on both machines.
I think its likely the character processing that is the bottleneck. You
can speed that part up by extracting the substrings directly:
> system.time({
+ dd <- rep("01-01-2005", 10000)
+ year <- as.numeric(substr(dd, 7, 10))
+ mon <- as.numeric(substr(dd, 1, 2))
+ day <- as.numeric(substr(dd, 4, 5))
+ x <- as.Date(ISOdate(year, mon, day))
+ }, gc = TRUE)
[1] 0.42 0.00 0.51 NA NA
> system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"), gc=TRUE)
[1] 1.08 0.00 1.22 NA NA
More information about the R-devel
mailing list