[Rd] Suggestion: mkString(NULL) should be NA

Martin Maechler maechler at stat.math.ethz.ch
Wed May 25 12:31:36 CEST 2016


>>>>> Gabriel Becker <gmbecker at ucdavis.edu>
>>>>>     on Tue, 24 May 2016 10:30:48 -0700 writes:

    > On Tue, May 24, 2016 at 9:30 AM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu>
    > wrote:

    >> On Tue, May 24, 2016 at 5:59 PM, Gabriel Becker <gmbecker at ucdavis.edu>
    >> wrote:
    >> > Shouldn't Rf_mkString(NULL) return (the c-level equivalent of)
    >> character()
    >> > rather than the NA_character_?
    >> 
    >> No. It should still be safe to assume that mkString() always returns a
    >> character vector of exactly length one. Anything else could lead to
    >> type errors.
    >> 


    > Well the thing is you're passing an invalid pointer, that doesn't point to
    > a C string, to a constructor expecting a valid const char *. I'm fine with
    > the contract being that mkString always returns a character vector of
    > length one, but that doesn't necessarily mean that the function needs to
    > accept NULL pointers. The contract as I understand it is that if you give
    > it a C string, it will create a CHARSXP for that string. In this light,
    > Bill's suggestion that it throw an error seems the most principled
    > response. I would think you would need to at the very least emit a warning.

I agree with Jerooen that mkChar() and mkString() may be
used in contexts where they can end up with a NULL and hence
should not segfault... and hence am willing the extra (very
small) penalty of checking for NULL.

    >> 
    >> > An empty string and NULL aren't the same.
    >> 
    >> Exactly! So if you pass in an empty C string, you get an empty R
    >> string, and if you pass in a null pointer you get NA.
    >> 
    >> Rf_mkString(NULL)   <--> NA
    >> Rf_mkString("")   <--> ""
    >> 
    >> There is no ambiguity, and much better than segfaulting.

Better than segfaulting, yes, but really agree with Bill (and
Gabe), also for Rf_mkChar(NULL): 
I think both functions should give an error in such a case
rather than returning NA_character_

It is an accident of some kind if they got NULL, no?

--
Martin Maechler,
ETH Zurich



    > Well, better than segfaulting is not really relevant here. No one is
    > arguing that it should segfault. The question is what behavior it should
    > have when it doesn't segfault.

    > It's true that a C empty string is not the same as NULL, but NULL isn't the
    > same as NA either. Semantically, for your use-case (which I gather arose
    > from interactions we had :) ) the NULL means there is no version, while NA
    > indicates there is a version but we don't know what it is. Imagine an
    > object class that represents a persons name (first, middle, last). Now take
    > two people, One has no middle name (and we know that when creating the
    > object) and another for whom we don't have any information about the middle
    > name, only first and last were reported. I would expect the first one to
    > have middle name either NULL or (in a data.frame context) "", while the
    > second would have NA_character_. In this light, mkString should arguably
    > generate "". i don't think the fact that there is another way to get "" is
    > a particularly large problem.

    > On the other hand, and in support of your position it came up as Michael
    > Lawrence and I were talking about this that asChar from utils.c  will give
    > you NA_STRING when you give it R_NilValue. That is a coercion though,
    > whereas arguably mkString is not. That said, consistency would probably be
    > good.

    > ~G


    > -- 
    > Gabriel Becker, PhD
    > Associate Scientist (Bioinformatics)
    > Genentech Research

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list