[Rd] [External] Re: 1954 from NA
Greg Warnes
greg @end|ng |rom w@rne@@net
Thu Jun 3 23:01:23 CEST 2021
I would be glad to add this to one of my R packages, probably `gdata`..
-G
Gregory R. Warnes, Ph.D.
greg using warnes.net
Eternity is a long time, take a friend!
> On May 26, 2021, at 1:09 PM, Adrian Dușa <dusa.adrian using gmail.com> wrote:
>
> Yes, that is even better.
> Best,
> Adrian
>
> On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>>
> wrote:
>
>> After 5 minutes more thought:
>>
>> - code non-missing as missingKind = NA, not 0, so that missingKind could
>> be a character vector, or missingKind = 0 could be supported.
>>
>> - print methods should return the main argument, so mine should be
>>
>> print.MultiMissing <- function(x, ...) {
>> vals <- as.character(x)
>> if (!is.character(x) || inherits(x, "noquote"))
>> print(noquote(vals))
>> else
>> print(vals)
>> invisible(x)
>> }
>>
>> This still needs a lot of improvement to be a good print method, but
>> I'll leave that to you.
>>
>> Duncan Murdoch
>>
>> On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
>>> On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
>>>> Dear Duncan,
>>>>
>>>> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <
>> murdoch.duncan using gmail.com
>>>> <mailto:murdoch.duncan using gmail.com>> wrote:
>>>>
>>>> You've already been told how to solve this: just add attributes
>> to the
>>>> objects. Use the standard NA to indicate that there is some kind of
>>>> missingness, and the attribute to describe exactly what it is.
>> Stick a
>>>> class on those objects and define methods so that subsetting and
>>>> arithmetic preserves the extra info you've added. If you do some
>>>> operation that turns those NAs into NaNs, big deal: the attribute
>> will
>>>> still be there, and is.na <http://is.na>(NaN) still returns TRUE.
>>>>
>>>>
>>>> I've already tried the attributes way, it is not so easy.
>>>
>>> If you have specific operations that are needed but that you can't get
>>> to work, post the issue here.
>>>
>>>> In the best case scenario, it unnecessarily triples the size of the
>>>> data, but perhaps this is the only way forward.
>>>
>>> I don't see how it could triple the size. Surely an integer has enough
>>> values to cover all possible kinds of missingness. So on integer or
>>> factor data you'd double the size, on real or character data you'd
>>> increase it by 50%. (This is assuming you're on a 64 bit platform with
>>> 32 bit integers and 64 bit reals and pointers.)
>>>
>>> Here's a tiny implementation to show what I'm talking about:
>>>
>>> asMultiMissing <- function(x) {
>>> if (isMultiMissing(x))
>>> return(x)
>>> missingKind <- ifelse(is.na(x), 1, 0)
>>> structure(x,
>>> missingKind = missingKind,
>>> class = c("MultiMissing", class(x)))
>>> }
>>>
>>> isMultiMissing <- function(x)
>>> inherits(x, "MultiMissing")
>>>
>>> missingKind <- function(x) {
>>> if (isMultiMissing(x))
>>> attr(x, "missingKind")
>>> else
>>> ifelse(is.na(x), 1, 0)
>>> }
>>>
>>> `missingKind<-` <- function(x, value) {
>>> class(x) <- setdiff(class(x), "MultiMissing")
>>> x[value != 0] <- NA
>>> x <- asMultiMissing(x)
>>> attr(x, "missingKind") <- value
>>> x
>>> }
>>>
>>> `[.MultiMissing` <- function(x, i, ...) {
>>> missings <- missingKind(x)
>>> x <- NextMethod()
>>> missings <- missings[i]
>>> missingKind(x) <- missings
>>> x
>>> }
>>>
>>> print.MultiMissing <- function(x, ...) {
>>> vals <- as.character(x)
>>> if (!is.character(x) || inherits(x, "noquote"))
>>> print(noquote(vals))
>>> else
>>> print(vals)
>>> }
>>>
>>> `[<-.MultiMissing` <- function(x, i, value, ...) {
>>> missings <- missingKind(x)
>>> class(x) <- setdiff(class(x), "MultiMissing")
>>> x[i] <- value
>>> missings[i] <- missingKind(value)
>>> missingKind(x) <- missings
>>> x
>>> }
>>>
>>> as.character.MultiMissing <- function(x, ...) {
>>> missings <- missingKind(x)
>>> result <- NextMethod()
>>> ifelse(missings != 0,
>>> paste0("NA.", missings), result)
>>>
>>> }
>>>
>>> This is incomplete. It doesn't do printing very well, and it doesn't
>>> handle the case of assigning a MultiMissing value to a regular vector at
>>> all. (I think you'd need an S4 implementation if you want to support
>>> that.) But it does the basics:
>>>
>>>> x <- 1:10
>>>> missingKind(x)[4] <- 23
>>>> x
>>> [1] 1 2 3 NA.23 5 6 7 8 9
>>> [10] 10
>>>> is.na(x)
>>> [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
>>> [10] FALSE
>>>> missingKind(x)
>>> [1] 0 0 0 23 0 0 0 0 0 0
>>>>
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> Base R doesn't need anything else.
>>>>
>>>> You complained that users shouldn't need to know about attributes,
>> and
>>>> they won't: you, as the author of the package that does this, will
>>>> handle all those details. Working in your subject area you know
>> all
>>>> the
>>>> different kinds of NAs that people care about, and how they code
>>>> them in
>>>> input data, so you can make it all totally transparent. If you do
>> it
>>>> well, someone in some other subject area with a completely
>> different
>>>> set
>>>> of kinds of missingness will be able to adapt your code to their
>> use.
>>>>
>>>>
>>>> But that is the whole point: the package author does not define possible
>>>> NAs (the possibilities are infinite), users do that.
>>>> The package should only provide a simple method to achieve that.
>>>>
>>>>
>>>> I imagine this has all been done in one of the thousands of
>> packages on
>>>> CRAN, but if it hasn't been done well enough for you, do it better.
>>>>
>>>>
>>>> If it were, I would have found it by now...
>>>>
>>>> Best wishes,
>>>> Adrian
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list