[Rd] rbind has confusing result for custom sub-class (possible bug?)
Michael Chirico
m|ch@e|ch|r|co4 @end|ng |rom gm@||@com
Mon Jun 3 05:21:52 CEST 2019
Thanks for following up! In fact that's exactly what was done here:
https://github.com/Rdatatable/data.table/pull/3602/files
On Sun, Jun 2, 2019 at 8:00 PM Joshua Ulrich <josh.m.ulrich using gmail.com>
wrote:
> I thought it would be good to summarize my thoughts, since I made a
> few hypotheses that turned out to be false.
>
> This isn't a bug in base R, in either rbind() or `[<-.Date`.
>
> To summarize the root cause:
> base::rbind.data.frame() calls `[<-` for each column of the
> data.frame, and there is no `[<-.IDate` method to ensure the
> replacement value is converted to integer. And, in fact, `[<-.Date`
> calls as.Date() and data.table::as.Date.IDate() calls as.numeric() on
> the IDate object. So the problem exists, and can be fixed, in
> data.table.
>
> Best,
> Josh
>
> On Mon, May 27, 2019 at 9:34 AM Joshua Ulrich <josh.m.ulrich using gmail.com>
> wrote:
> >
> > Follow-up (inline) on my comment about a potential issue in `[<-.Date`.
> >
> > On Mon, May 27, 2019 at 9:31 AM Michael Chirico
> > <michaelchirico4 using gmail.com> wrote:
> > >
> > > Yes, thanks for following up on thread here. And thanks again for
> clearing things up, your email was a finger snap of clarity on the whole
> issue.
> > >
> > > I'll add that actually it was data.table's code at fault on the
> storage conversion -- note that if you use an arbitrary sub-class 'foo'
> with no methods defined, it'll stay integer.
> > >
> > > That's because [<- calls as.Date and then as.Date.IDate, and that
> method (ours) has as.numeric(); earlier I had recognized that if we
> commented that line, the issue was "fixed" but I still wasn't understanding
> the root cause.
> > >
> > > My last curiosity on this issue will be in my follow-up thread.
> > >
> > > Mike C
> > >
> > > On Mon, May 27, 2019, 10:25 PM Joshua Ulrich <josh.m.ulrich using gmail.com>
> wrote:
> > >>
> > >> On Sun, May 26, 2019 at 6:47 AM Joshua Ulrich <
> josh.m.ulrich using gmail.com> wrote:
> > >> >
> > >> > On Sun, May 26, 2019 at 4:06 AM Michael Chirico
> > >> > <michaelchirico4 using gmail.com> wrote:
> > >> > >
> > >> > > Have finally managed to come up with a fix after checking out
> sys.calls()
> > >> > > from within the as.Date.IDate debugger, which shows something
> like:
> > >> > >
> > >> > > [[1]] rbind(DF, DF)
> > >> > > [[2]] rbind(deparse.level, ...)
> > >> > > [[3]] `[<-`(`*tmp*`, ri, value = 18042L)
> > >> > > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L)
> > >> > > [[5]] as.Date(value)
> > >> > > [[6]] as.Date.IDate(value)
> > >> > >
> > >> > > I'm not sure why [<- is called, I guess the implementation is to
> assign to
> > >> > > the output block by block? Anyway, we didn't have a [<- method.
> And
> > >> > > [<-.Date looks like:
> > >> > >
> > >> > > value <- unclass(as.Date(value)) # <- converts to double
> > >> > > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate'
> class
> > >> > >
> > >> > > So we can fix our bug by defining a [<- class; the question that
> I still
> > >> > > don't see answered in documentation or source code is, why/where
> is [<-
> > >> > > called, exactly?
> > >> > >
> > >> > Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The
> > >> > `[<-` call is this line:
> > >> > value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij
> > >> >
> > >> > That's where the storage.mode changes from integer to double.
> > >> >
> > >> > debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else
> xij
> > >> > Browse[2]>
> > >> > debug: xij
> > >> > Browse[2]> storage.mode(xij)
> > >> > [1] "integer"
> > >> > Browse[2]> value[[jj]][ri]
> > >> > [1] "2019-05-26"
> > >> > Browse[2]> storage.mode(value[[jj]][ri])
> > >> > [1] "integer"
> > >> > Browse[2]>
> > >> > debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm
> > >> > Browse[2]> storage.mode(value[[jj]][ri])
> > >> > [1] "double"
> > >> >
> > >> To be clear, I don't think this is a bug in rbind() or
> > >> rbind.data.frame(). The confusion is that rbind.data.frame() calls
> > >> `[<-` for each column of the data.frame, and there is no `[<-.IDate`
> > >> method. So the parent class method is dispatched, which converts the
> > >> storage mode to double.
> > >>
> > >> Someone may argue that this is an issue with `[<-.Date`, and that it
> > >> shouldn't convert the storage.mode from integer to double.
> >
> > I don't think this is an issue. The storage mode isn't converted if
> > the replacement is the same storage mode. For example:
> >
> > R> x <- .Date(1:5)
> > R> storage.mode(x)
> > [1] "integer"
> > R> x[1L] <- .Date(0L)
> > R> storage.mode(x)
> > [1] "integer"
> > R> x[1L] <- .Date(0)
> > R> storage.mode(x)
> > [1] "double"
> >
> > >> >
> > >> > > Mike C
> > >> > >
> > >> > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico <
> michaelchirico4 using gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Debugging this issue:
> > >> > > >
> > >> > > > https://github.com/Rdatatable/data.table/issues/2008
> > >> > > >
> > >> > > > We have custom class 'IDate' which inherits from 'Date' (it
> just forces
> > >> > > > integer storage for efficiency, hence, I).
> > >> > > >
> > >> > > > The concatenation done by rbind, however, breaks this and
> returns a double:
> > >> > > >
> > >> > > > library(data.table)
> > >> > > > DF = data.frame(date = as.IDate(Sys.Date()))
> > >> > > > storage.mode(rbind(DF, DF)$date)
> > >> > > > # [1] "double"
> > >> > > >
> > >> > > > This is specific to base::rbind (data.table's rbind returns an
> integer as
> > >> > > > expected); in ?rbind we see:
> > >> > > >
> > >> > > > The method dispatching is not done via UseMethod(), but by
> C-internal
> > >> > > > dispatching. Therefore there is no need for, e.g.,
> rbind.default.
> > >> > > > The dispatch algorithm is described in the source file
> > >> > > > (‘.../src/main/bind.c’) as
> > >> > > > 1. For each argument we get the list of possible class
> memberships from
> > >> > > > the class attribute.
> > >> > > > 2. *We inspect each class in turn to see if there is an
> applicable
> > >> > > > method.*
> > >> > > > 3. If we find an applicable method we make sure that it is
> identical to
> > >> > > > any method determined for prior arguments. If it is identical,
> we proceed,
> > >> > > > otherwise we immediately drop through to the default code.
> > >> > > >
> > >> > > > It's not clear what #2 means -- an applicable method *for
> what*? Glancing
> > >> > > > at the source code would suggest it's looking for rbind.IDate:
> > >> > > >
> > >> > > >
> https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063
> > >> > > >
> > >> > > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind");
> // should
> > >> > > > be rbind here
> > >> > > > const char *s = translateChar(STRING_ELT(classlist, i)); //
> iterating over
> > >> > > > the classes, should get to IDate first
> > >> > > > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate
> > >> > > >
> > >> > > > but adding this method (or even exporting it) is no help [
> simply defining
> > >> > > > rbind.IDate = function(...) as.IDate(NextMethod()) ]
> > >> > > >
> > >> > > > Lastly, it appears that as.Date.IDate is called, which is
> causing the type
> > >> > > > conversion:
> > >> > > >
> > >> > > > debug(data.table:::as.Date.IDate)
> > >> > > > rbind(DF, DF) # launches debugger
> > >> > > > x
> > >> > > > # [1] "2019-05-26" <-- singleton, so apparently applied to
> DF$date, not
> > >> > > > c(DF$date, DF$date)
> > >> > > > undebug(data.table:::as.Date.IDate)
> > >> > > >
> > >> > > > I can't really wrap my head around why as.Date is being called
> here, and
> > >> > > > even allowing that, why the end result is still the original
> class [
> > >> > > > class(rbind(DF, DF)$date) == c('IDate', 'Date') ]
> > >> > > >
> > >> > > > So, I'm beginning to think this might be a bug. Am I missing
> something?
> > >> > > >
> > >> > >
> > >> > > [[alternative HTML version deleted]]
> > >> > >
> > >> > > ______________________________________________
> > >> > > R-devel using r-project.org mailing list
> > >> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >> >
> > >>
> >
>
>
>
> --
> Joshua Ulrich | about.me/joshuaulrich
> FOSS Trading | www.fosstrading.com
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list