[Rd] [External] Re: 1954 from NA
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Wed May 26 18:05:45 CEST 2021
After 5 minutes more thought:
- code non-missing as missingKind = NA, not 0, so that missingKind could
be a character vector, or missingKind = 0 could be supported.
- print methods should return the main argument, so mine should be
print.MultiMissing <- function(x, ...) {
vals <- as.character(x)
if (!is.character(x) || inherits(x, "noquote"))
print(noquote(vals))
else
print(vals)
invisible(x)
}
This still needs a lot of improvement to be a good print method, but
I'll leave that to you.
Duncan Murdoch
On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
> On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
>> Dear Duncan,
>>
>> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <murdoch.duncan using gmail.com
>> <mailto:murdoch.duncan using gmail.com>> wrote:
>>
>> You've already been told how to solve this: just add attributes to the
>> objects. Use the standard NA to indicate that there is some kind of
>> missingness, and the attribute to describe exactly what it is. Stick a
>> class on those objects and define methods so that subsetting and
>> arithmetic preserves the extra info you've added. If you do some
>> operation that turns those NAs into NaNs, big deal: the attribute will
>> still be there, and is.na <http://is.na>(NaN) still returns TRUE.
>>
>>
>> I've already tried the attributes way, it is not so easy.
>
> If you have specific operations that are needed but that you can't get
> to work, post the issue here.
>
>> In the best case scenario, it unnecessarily triples the size of the
>> data, but perhaps this is the only way forward.
>
> I don't see how it could triple the size. Surely an integer has enough
> values to cover all possible kinds of missingness. So on integer or
> factor data you'd double the size, on real or character data you'd
> increase it by 50%. (This is assuming you're on a 64 bit platform with
> 32 bit integers and 64 bit reals and pointers.)
>
> Here's a tiny implementation to show what I'm talking about:
>
> asMultiMissing <- function(x) {
> if (isMultiMissing(x))
> return(x)
> missingKind <- ifelse(is.na(x), 1, 0)
> structure(x,
> missingKind = missingKind,
> class = c("MultiMissing", class(x)))
> }
>
> isMultiMissing <- function(x)
> inherits(x, "MultiMissing")
>
> missingKind <- function(x) {
> if (isMultiMissing(x))
> attr(x, "missingKind")
> else
> ifelse(is.na(x), 1, 0)
> }
>
> `missingKind<-` <- function(x, value) {
> class(x) <- setdiff(class(x), "MultiMissing")
> x[value != 0] <- NA
> x <- asMultiMissing(x)
> attr(x, "missingKind") <- value
> x
> }
>
> `[.MultiMissing` <- function(x, i, ...) {
> missings <- missingKind(x)
> x <- NextMethod()
> missings <- missings[i]
> missingKind(x) <- missings
> x
> }
>
> print.MultiMissing <- function(x, ...) {
> vals <- as.character(x)
> if (!is.character(x) || inherits(x, "noquote"))
> print(noquote(vals))
> else
> print(vals)
> }
>
> `[<-.MultiMissing` <- function(x, i, value, ...) {
> missings <- missingKind(x)
> class(x) <- setdiff(class(x), "MultiMissing")
> x[i] <- value
> missings[i] <- missingKind(value)
> missingKind(x) <- missings
> x
> }
>
> as.character.MultiMissing <- function(x, ...) {
> missings <- missingKind(x)
> result <- NextMethod()
> ifelse(missings != 0,
> paste0("NA.", missings), result)
>
> }
>
> This is incomplete. It doesn't do printing very well, and it doesn't
> handle the case of assigning a MultiMissing value to a regular vector at
> all. (I think you'd need an S4 implementation if you want to support
> that.) But it does the basics:
>
> > x <- 1:10
> > missingKind(x)[4] <- 23
> > x
> [1] 1 2 3 NA.23 5 6 7 8 9
> [10] 10
> > is.na(x)
> [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
> [10] FALSE
> > missingKind(x)
> [1] 0 0 0 23 0 0 0 0 0 0
> >
>
> Duncan Murdoch
>
>>
>> Base R doesn't need anything else.
>>
>> You complained that users shouldn't need to know about attributes, and
>> they won't: you, as the author of the package that does this, will
>> handle all those details. Working in your subject area you know all
>> the
>> different kinds of NAs that people care about, and how they code
>> them in
>> input data, so you can make it all totally transparent. If you do it
>> well, someone in some other subject area with a completely different
>> set
>> of kinds of missingness will be able to adapt your code to their use.
>>
>>
>> But that is the whole point: the package author does not define possible
>> NAs (the possibilities are infinite), users do that.
>> The package should only provide a simple method to achieve that.
>>
>>
>> I imagine this has all been done in one of the thousands of packages on
>> CRAN, but if it hasn't been done well enough for you, do it better.
>>
>>
>> If it were, I would have found it by now...
>>
>> Best wishes,
>> Adrian
>
More information about the R-devel
mailing list