[Rd] [External] Re: 1954 from NA

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed May 26 18:05:45 CEST 2021


After 5 minutes more thought:

- code non-missing as missingKind = NA, not 0, so that missingKind could 
be a character vector, or missingKind = 0 could be supported.

- print methods should return the main argument, so mine should be

print.MultiMissing <- function(x, ...) {
   vals <- as.character(x)
   if (!is.character(x) || inherits(x, "noquote"))
     print(noquote(vals))
   else
     print(vals)
   invisible(x)
}

This still needs a lot of improvement to be a good print method, but 
I'll leave that to you.

Duncan Murdoch

On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
> On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
>> Dear Duncan,
>>
>> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <murdoch.duncan using gmail.com
>> <mailto:murdoch.duncan using gmail.com>> wrote:
>>
>>      You've already been told how to solve this:  just add attributes to the
>>      objects. Use the standard NA to indicate that there is some kind of
>>      missingness, and the attribute to describe exactly what it is.  Stick a
>>      class on those objects and define methods so that subsetting and
>>      arithmetic preserves the extra info you've added. If you do some
>>      operation that turns those NAs into NaNs, big deal:  the attribute will
>>      still be there, and is.na <http://is.na>(NaN) still returns TRUE.
>>
>>
>> I've already tried the attributes way, it is not so easy.
> 
> If you have specific operations that are needed but that you can't get
> to work, post the issue here.
> 
>> In the best case scenario, it unnecessarily triples the size of the
>> data, but perhaps this is the only way forward.
> 
> I don't see how it could triple the size.  Surely an integer has enough
> values to cover all possible kinds of missingness.  So on integer or
> factor data you'd double the size, on real or character data you'd
> increase it by 50%.  (This is assuming you're on a 64 bit platform with
> 32 bit integers and 64 bit reals and pointers.)
> 
> Here's a tiny implementation to show what I'm talking about:
> 
> asMultiMissing <- function(x) {
>     if (isMultiMissing(x))
>       return(x)
>     missingKind <- ifelse(is.na(x), 1, 0)
>     structure(x,
>               missingKind = missingKind,
>               class = c("MultiMissing", class(x)))
> }
> 
> isMultiMissing <- function(x)
>     inherits(x, "MultiMissing")
> 
> missingKind <- function(x) {
>     if (isMultiMissing(x))
>       attr(x, "missingKind")
>     else
>       ifelse(is.na(x), 1, 0)
> }
> 
> `missingKind<-` <- function(x, value) {
>     class(x) <- setdiff(class(x), "MultiMissing")
>     x[value != 0] <- NA
>     x <- asMultiMissing(x)
>     attr(x, "missingKind") <- value
>     x
> }
> 
> `[.MultiMissing` <- function(x, i, ...) {
>     missings <- missingKind(x)
>     x <- NextMethod()
>     missings <- missings[i]
>     missingKind(x) <- missings
>     x
> }
> 
> print.MultiMissing <- function(x, ...) {
>     vals <- as.character(x)
>     if (!is.character(x) || inherits(x, "noquote"))
>       print(noquote(vals))
>     else
>       print(vals)
> }
> 
> `[<-.MultiMissing` <- function(x, i, value, ...) {
>     missings <- missingKind(x)
>     class(x) <- setdiff(class(x), "MultiMissing")
>     x[i] <- value
>     missings[i] <- missingKind(value)
>     missingKind(x) <- missings
>     x
> }
> 
> as.character.MultiMissing <- function(x, ...) {
>     missings <- missingKind(x)
>     result <- NextMethod()
>     ifelse(missings != 0,
>            paste0("NA.", missings), result)
> 
> }
> 
> This is incomplete.  It doesn't do printing very well, and it doesn't
> handle the case of assigning a MultiMissing value to a regular vector at
> all.  (I think you'd need an S4 implementation if you want to support
> that.)  But it does the basics:
> 
>   > x <- 1:10
>   > missingKind(x)[4] <- 23
>   > x
>    [1] 1     2     3     NA.23 5     6     7     8     9
> [10] 10
>   > is.na(x)
>    [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
> [10] FALSE
>   > missingKind(x)
>    [1]  0  0  0 23  0  0  0  0  0  0
>   >
> 
> Duncan Murdoch
> 
>>
>>      Base R doesn't need anything else.
>>
>>      You complained that users shouldn't need to know about attributes, and
>>      they won't:  you, as the author of the package that does this, will
>>      handle all those details.  Working in your subject area you know all
>>      the
>>      different kinds of NAs that people care about, and how they code
>>      them in
>>      input data, so you can make it all totally transparent.  If you do it
>>      well, someone in some other subject area with a completely different
>>      set
>>      of kinds of missingness will be able to adapt your code to their use.
>>
>>
>> But that is the whole point: the package author does not define possible
>> NAs (the possibilities are infinite), users do that.
>> The package should only provide a simple method to achieve that.
>>
>>
>>      I imagine this has all been done in one of the thousands of packages on
>>      CRAN, but if it hasn't been done well enough for you, do it better.
>>
>>
>> If it were, I would have found it by now...
>>
>> Best wishes,
>> Adrian
>



More information about the R-devel mailing list