[R] Mystery Error in midnightStandard

Yohan Chalabi chalabi at phys.ethz.ch
Wed Jan 28 16:28:28 CET 2009

>>>> "TB" == Ted Byers <r.ted.byers at gmail.com>
>>>> on Wed, 28 Jan 2009 09:30:58 -0500

   TB> It is certain that all entries have the same format, but I'm
   TB> starting to
   TB> think that the error message is something of a red herring.
   TB> Consider this:
   TB> > year = 2009
   TB> > week = 0
   TB> > day = 3
   TB> > datestr = sprintf(%i-%i-%i,year,week,day);datestr
   TB> [1] 2009-0-3
   TB> > date1 = timeDate(datestr, format = %Y-%U-%w);
   TB> > date1
   TB> GMT
   TB> [1] [NA]
   TB> > day = 4
   TB> > datestr = sprintf(%i-%i-%i,year,week,day);datestr
   TB> [1] 2009-0-4
   TB> > date1 = timeDate(datestr, format = %Y-%U-%w);
   TB> > date1
   TB> GMT
   TB> [1] [2009-01-01]
   TB> >
   TB> > datestr = sprintf(%i-%i-%i,year,week,3);datestr
   TB> [1] 2009-0-3
   TB> > date2 = timeDate(datestr, format = %Y-%U-%w);date2
   TB> GMT
   TB> [1] [NA]
   TB> > difftimeDate(date2,date1, units = weeks)
   TB> Error in midnightStandard(charvec, format) :
   TB> 'charvec' has non-NA entries of different number of characters
   TB> In addition: Warning messages:
   TB> 1: In min(x) : no non-missing arguments to min; returning Inf
   TB> 2: In max(x) : no non-missing arguments to max; returning -Inf
   TB> The first values for year, week and day are the values on
   TB> which my loop
   TB> dies.  It returns 'NA' here.  It seems clear that it is
   TB> returning NA because
   TB> the date that data corresponds to is 2008-12-31.
   TB> The error is being produced by difftimeDate rather than timeDate
   TB> (as shown
   TB> by the above session).  But that represents a flaw in the
   TB> function design.

This is not a flaw in timeDate. it behaves the same way as

strptime(datestr, format = "%Y-%U-%w")

Instead of claiming that there is a flaw in the function you could have
suggested an 'is.na' method for 'timeDate'.

I will add an 'is.na' method in the dev version of 'timeDate'.


   TB> It should fail when taking the elapsed time between a null
   TB> and the present,
   TB> but if I wrote such a function, I'd have it return null
   TB> (perhaps with a
   TB> warning) rather than just die.
   TB> A bigger issue is that timeDate ought never give null here
   TB> (which is what I
   TB> assume 'NA' means), since all the data comes from transaction
   TB> data with real
   TB> dates, so the elapsed time, measured in weeks, ought to always
   TB> be a valid
   TB> real number that is positive semidefinite.  I have not yet
   TB> come to any
   TB> conclusions as to how it ought to behave (whether to return
   TB> new years day,
   TB> along with a warning, or to return the date requested by
   TB> reinvoking itself
   TB> with the year and week adjusted so a valid date is returned).
   TB> On a practical side, how would I test date2 to see if it is
   TB> null, so I can
   TB> give it a sensible default value?
   TB> A more troubling thought is that with this handling of dates
   TB> in this
   TB> combination of SQL (my group by clause uses
   TB> YEAR(transaction_date),WEEK(transaction_date)) to get the data
   TB> and R to
   TB> process it, the week containing new years day will ALWAYS be
   TB> split in two at
   TB> the first second of the new year. I'm going to have to either
   TB> figure out a
   TB> way to correct this, or ignore it (as it doesn't actually make
   TB> things wrong,
   TB> but rather it splits a sample into two unequal parts).

PhD student
Swiss Federal Institute of Technology


More information about the R-help mailing list