[Rd] Suggestion: mkString(NULL) should be NA
Martin Maechler
maechler at stat.math.ethz.ch
Wed May 25 12:31:36 CEST 2016
>>>>> Gabriel Becker <gmbecker at ucdavis.edu>
>>>>> on Tue, 24 May 2016 10:30:48 -0700 writes:
> On Tue, May 24, 2016 at 9:30 AM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu>
> wrote:
>> On Tue, May 24, 2016 at 5:59 PM, Gabriel Becker <gmbecker at ucdavis.edu>
>> wrote:
>> > Shouldn't Rf_mkString(NULL) return (the c-level equivalent of)
>> character()
>> > rather than the NA_character_?
>>
>> No. It should still be safe to assume that mkString() always returns a
>> character vector of exactly length one. Anything else could lead to
>> type errors.
>>
> Well the thing is you're passing an invalid pointer, that doesn't point to
> a C string, to a constructor expecting a valid const char *. I'm fine with
> the contract being that mkString always returns a character vector of
> length one, but that doesn't necessarily mean that the function needs to
> accept NULL pointers. The contract as I understand it is that if you give
> it a C string, it will create a CHARSXP for that string. In this light,
> Bill's suggestion that it throw an error seems the most principled
> response. I would think you would need to at the very least emit a warning.
I agree with Jerooen that mkChar() and mkString() may be
used in contexts where they can end up with a NULL and hence
should not segfault... and hence am willing the extra (very
small) penalty of checking for NULL.
>>
>> > An empty string and NULL aren't the same.
>>
>> Exactly! So if you pass in an empty C string, you get an empty R
>> string, and if you pass in a null pointer you get NA.
>>
>> Rf_mkString(NULL) <--> NA
>> Rf_mkString("") <--> ""
>>
>> There is no ambiguity, and much better than segfaulting.
Better than segfaulting, yes, but really agree with Bill (and
Gabe), also for Rf_mkChar(NULL):
I think both functions should give an error in such a case
rather than returning NA_character_
It is an accident of some kind if they got NULL, no?
--
Martin Maechler,
ETH Zurich
> Well, better than segfaulting is not really relevant here. No one is
> arguing that it should segfault. The question is what behavior it should
> have when it doesn't segfault.
> It's true that a C empty string is not the same as NULL, but NULL isn't the
> same as NA either. Semantically, for your use-case (which I gather arose
> from interactions we had :) ) the NULL means there is no version, while NA
> indicates there is a version but we don't know what it is. Imagine an
> object class that represents a persons name (first, middle, last). Now take
> two people, One has no middle name (and we know that when creating the
> object) and another for whom we don't have any information about the middle
> name, only first and last were reported. I would expect the first one to
> have middle name either NULL or (in a data.frame context) "", while the
> second would have NA_character_. In this light, mkString should arguably
> generate "". i don't think the fact that there is another way to get "" is
> a particularly large problem.
> On the other hand, and in support of your position it came up as Michael
> Lawrence and I were talking about this that asChar from utils.c will give
> you NA_STRING when you give it R_NilValue. That is a coercion though,
> whereas arguably mkString is not. That said, consistency would probably be
> good.
> ~G
> --
> Gabriel Becker, PhD
> Associate Scientist (Bioinformatics)
> Genentech Research
> [[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list