[Rd] Varying as.Date performance
Jeff Enos
jeff at kanecap.com
Thu May 5 15:24:26 CEST 2005
Prof Brian Ripley writes:
> One other possibly difference would be locale, but this is slow on FC3
> (2.3.4 now) in the C locale. Almost all the time is in strptime:
> R profiling shows
>
> > summaryRprof()
> $by.self
> self.time self.pct total.time total.pct
> "strptime" 29.58 99.7 29.58 99.7
> "as.Date.character" 0.10 0.3 29.68 100.0
> "as.Date" 0.00 0.0 29.68 100.0
> "eval" 0.00 0.0 29.68 100.0
> "system.time" 0.00 0.0 29.68 100.0
>
> Now on a glibc 2.3.x system R's internal replacement for strptime will be
> used (to work around bugs) so it must be some other part of the POSIX
> time-handling that has changed.
>
> The next step would be to do C-level profiling and then retrofit the
> crucial code from glibc 2.3.2.
Thanks for these suggestions. C-level profiling yields the following:
% cumulative self self total
time seconds seconds calls s/call s/call name
36.01 5.34 5.34 100000 0.00 0.00 get_locale_strings
4.32 5.98 0.64 100000 0.00 0.00 mktime00
3.98 6.57 0.59 277462 0.00 0.00 Rf_eval
3.71 7.12 0.55 472935 0.00 0.00 Rf_findVarInFrame3
3.64 7.66 0.54 100000 0.00 0.00 strptime_internal
3.51 8.18 0.52 1 0.52 7.51 do_strptime
It looks like strftime is called from get_locale_strings, which might
be the culprit. Any suggestions on where I might go from here?
> It does seem a pretty unusual application of R for 10^5 date conversions
> to be needed and for 30 secs to be an appreciable part of the analysis
> time on such a data set.
This is an issue for me when interactively loading a sizable
timeseries dataset into R from Postgres, converting character strings
into objects of class Date.
Thanks,
Jeff
>
> On Wed, 4 May 2005, Jeff Enos wrote:
>
> > R-devel,
> >
> > The performance of as.Date differs by a large degree between one of my
> > machines with glibc 2.3.2:
> >
> >> system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> > [1] 1.17 0.00 1.18 0.00 0.00
> >
> > and a comparable machine with glibc 2.3.3:
> >
> >> system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
> > [1] 31.20 46.89 81.01 0.00 0.00
> >
> > both with the same R version:
> >
> >> R.version
> > _
> > platform i686-pc-linux-gnu
> > arch i686
> > os linux-gnu
> > system i686, linux-gnu
> > status
> > major 2
> > minor 1.0
> > year 2005
> > month 04
> > day 18
> > language R
> >
> > I'm focusing on differences in glibc versions because of as.Date's use
> > of strptime.
> >
> > Does it seem likely that the cause of this discrepancy is in fact
> > glibc? If so, can anyone tell me how to make the performance of the
> > second machine more like the first?
> >
> > I have verified that using the chron package, which I don't believe
> > uses strptime, for the above character conversion performs equally
> > well on both machines.
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list