[Rd] For integer vectors, `as(x, "numeric")` has no effect.

Martin Maechler maechler at stat.math.ethz.ch
Tue Jan 5 10:31:00 CET 2016


>>>>> Josh O'Brien <joshmobrien at gmail.com>
>>>>>     on Mon, 4 Jan 2016 16:16:51 -0800 writes:

    > On Dec 19, 2015, at 3:32 AM, Martin Maechler <maechler at
    stat.math.ethz.ch> wrote:

    >>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> on
    >>>>>>> Sat, 12 Dec 2015 10:32:51 +0100 writes:
    >> 
    >>>>>>> John Chambers <jmc at r-project.org> on Fri, 11 Dec
    >>>>>>> 2015 10:11:05 -0800 writes:
    >> 
    >>>> Somehow, the most obvious fixes are always
    >>>> back-incompatible these days.  The example intrigued
    >>>> me, so I looked into it a bit (should have been doing
    >>>> something else, but ....)
    >> 
    >>>> You're right that this is the proverbial
    >>>> thin-edge-of-the-wedge.
    >> 
    >>>> The problem is in setDataPart(), which will be called
    >>>> whenever a class extends one of the vector types.
    >> 
    >>>> It does as(value, dataClass) The key point is that the
    >>>> third argument to as(), strict=TRUE by default.  So,
    >>>> yes, the change will cause all integer vectors to
    >>>> become double when the class extends "numeric".
    >>>> Generally, strict=TRUE makes sense here and of course
    >>>> changing THAT would open up yet more incompatibilities.
    >> 
    >>>> For back compatibility, one would have to have some
    >>>> special code in setDataPart() for the case of
    >>>> integer/numeric.
    >> 
    >>>> John
    >> 
    >>>> (Historically, the original sin was probably not making
    >>>> a distinction between "numeric" as a virtual class and
    >>>> "double" as a type/class.)
    >> 
    >>> Yes, indeed.  In the mean time, I've seen more cases
    >>> where "the change will cause all integer vectors to
    >>> become double when the class extends "numeric".  seems
    >>> detrimental.
    >> 
    >>> OTOH, I still think we could go in the right direction
    >>> --- hopefully along the wishes of bioconductor S4
    >>> development, see Martin Morgan's e-mail:
    >> 
    >>> [This is all S4 - only; should not much affect base R /
    >>> S3] Currently, "integer" is a subclass of "numeric" and
    >>> so the "integer become double" part seems unwanted to
    >>> me.  OTOH, it would really make sense to more formally
    >>> have the basic subclasses of "numeric" to be "integer"
    >>> and "double", and to let as(*, "double") to become
    >>> different to as(*, "numeric") [Again, this is just for
    >>> the S4 classes and as() coercions, *not* e.g.  for
    >>> as.numeric() / as.double() !]
    >> 
    >>> In the DEPRECATED part of the NEWS for R 2.7.0 (April
    >>> 2008) we have had
    >> 
    >>> o The S4 pseudo-classes "single" and double have been
    >>> removed.  (The S4 class for a REALSXP is "numeric": for
    >>> back-compatibility as(x, "double") coerces to
    >>> "numeric".)
    >> 
    >>> I think the removal of "single" was fine, but in
    >>> hindsight, maybe the removal of "double" -- which was
    >>> partly broken then -- possibly could rather have been a
    >>> fixup of "double" along the following
    >> 
    >>> Current "thought experiment proposal" :
    >> 
    >>> 1) "numeric" := {"integer", "double"} { class -
    >>> subclasses } 2) as(1L, "numeric") continues to return 1L
    >>> .. since integer is one case of "numeric" 3) as(1L,
    >>> "double") newly returns 1.0 {and in fact would be
    >>> "equivalent" to as.double(1L)}
    >> 
    >>> After the above change, S4 as(*, "double") would
    >>> correspond to S3 as.double but as(*, "numeric") would
    >>> continue to differ from as.numeric(*), the former *not*
    >>> changing integers to double.
    >> 
    >>> Martin
    >> 
    >> Also note that e.g.
    >> 
    >> class(pi) would return "double" instead of "numeric"
    >> 
    >> and this will break all the bad programming style usages
    >> of
    >> 
    >> if(class(x) == "numeric")
    >> 
    >> which I tend to see in gazillions of user and even
    >> package codes This bad (aka error prone !)  because
    >> "correct" usage would be
    >> 
    >> if(inherits(x, "numeric"))
    >> 
    >> and that of course would *not* break after the change
    >> above.
    >> 
    >> - - - -
    >> 
    >> A week later, I'm still pretty convinced it would be
    >> worth going in the direction proposed above.
    >> 
    >> But I was actually hoping for some encouragement or
    >> "mental support"...  or then to hear why you think the
    >> proposition is not good or not viable ...
    >> 
    >> 

    > I really like Martin Maechler's "thought experiment
    > proposal", but (based partly on the reception its gotten)
    > figure I mustn't be appreciating the complications it
    > would introduce..

Actually, I've spent half day implementing it and was very
pleased about it... as matter of fact it passed *all* our checks
also in all recommended packages (*)

To do it cleanly... with very few code changes,
the *only* consequence would be that

   class(1.)

(and similar) then returned  "double" instead of "numeric".
which  *would*  be logical consequent, because indeed,

   numeric = {integer, double}

in that new scheme, and     class(1L) also returns "integer".

To my big chagrin there was very big opposition such a change,
IIRC, mainly on the grounds that for 20 years or so S and then R
books and publications had said that double and numeric should
be basically the same.
   
(*) Below you have a C level proposal which as you note is
    similar to John Chambers R level change:

The consequence is that basically you can no longer have "integer"
entries in "numeric" slots; they are automagically made into "double".
I personally find that not really "acceptable" {waste of storage},
and I would guess that more code "out there in package-land and
user-code" would break than with my change.

    > That said, if it's decided to just make a smaller fix of
    > as(x, "numeric"), might it be better to make the change at
    > the C level, to R_set_class in $RHOME/src/main/coerce.c?

I'm not seeing the advantage to make the change there, apart
from possibly some efficiency gain.

For the time being, I will not work on this ... mainly as I still
believe that my proposal would lead to a much much cleaner setup
(and yes, even be worth some small changes in new editions of
 those R books which deal with such subtle issues)

Martin



More information about the R-devel mailing list