[Rd] Varying as.Date performance

Jeff Enos jeff at kanecap.com
Thu May 5 15:24:26 CEST 2005


Prof Brian Ripley writes:
 > One other possibly difference would be locale, but this is slow on FC3 
 > (2.3.4 now) in the C locale.  Almost all the time is in strptime:
 > R profiling shows
 > 
 > > summaryRprof()
 > $by.self
 >                      self.time self.pct total.time total.pct
 > "strptime"              29.58     99.7      29.58      99.7
 > "as.Date.character"      0.10      0.3      29.68     100.0
 > "as.Date"                0.00      0.0      29.68     100.0
 > "eval"                   0.00      0.0      29.68     100.0
 > "system.time"            0.00      0.0      29.68     100.0
 > 
 > Now on a glibc 2.3.x system R's internal replacement for strptime will be 
 > used (to work around bugs) so it must be some other part of the POSIX 
 > time-handling that has changed.
 > 
 > The next step would be to do C-level profiling and then retrofit the 
 > crucial code from glibc 2.3.2.

Thanks for these suggestions.  C-level profiling yields the following:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 36.01      5.34     5.34   100000     0.00     0.00  get_locale_strings
  4.32      5.98     0.64   100000     0.00     0.00  mktime00
  3.98      6.57     0.59   277462     0.00     0.00  Rf_eval
  3.71      7.12     0.55   472935     0.00     0.00  Rf_findVarInFrame3
  3.64      7.66     0.54   100000     0.00     0.00  strptime_internal
  3.51      8.18     0.52        1     0.52     7.51  do_strptime

It looks like strftime is called from get_locale_strings, which might
be the culprit.  Any suggestions on where I might go from here?

 > It does seem a pretty unusual application of R for 10^5 date conversions 
 > to be needed and for 30 secs to be an appreciable part of the analysis 
 > time on such a data set.

This is an issue for me when interactively loading a sizable
timeseries dataset into R from Postgres, converting character strings
into objects of class Date.

Thanks,

Jeff

 > 
 > On Wed, 4 May 2005, Jeff Enos wrote:
 > 
 > > R-devel,
 > >
 > > The performance of as.Date differs by a large degree between one of my
 > > machines with glibc 2.3.2:
 > >
 > >> system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
 > > [1] 1.17 0.00 1.18 0.00 0.00
 > >
 > > and a comparable machine with glibc 2.3.3:
 > >
 > >> system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"))
 > > [1] 31.20 46.89 81.01  0.00  0.00
 > >
 > > both with the same R version:
 > >
 > >> R.version
 > >         _
 > > platform i686-pc-linux-gnu
 > > arch     i686
 > > os       linux-gnu
 > > system   i686, linux-gnu
 > > status
 > > major    2
 > > minor    1.0
 > > year     2005
 > > month    04
 > > day      18
 > > language R
 > >
 > > I'm focusing on differences in glibc versions because of as.Date's use
 > > of strptime.
 > >
 > > Does it seem likely that the cause of this discrepancy is in fact
 > > glibc?  If so, can anyone tell me how to make the performance of the
 > > second machine more like the first?
 > >
 > > I have verified that using the chron package, which I don't believe
 > > uses strptime, for the above character conversion performs equally
 > > well on both machines.
 > 
 > -- 
 > Brian D. Ripley,                  ripley at stats.ox.ac.uk
 > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 > University of Oxford,             Tel:  +44 1865 272861 (self)
 > 1 South Parks Road,                     +44 1865 272866 (PA)
 > Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list