[Rd] [External] Re: 1954 from NA
Adrian Dușa
du@@@@dr|@n @end|ng |rom gm@||@com
Wed May 26 19:09:50 CEST 2021
Yes, that is even better.
Best,
Adrian
On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:
> After 5 minutes more thought:
>
> - code non-missing as missingKind = NA, not 0, so that missingKind could
> be a character vector, or missingKind = 0 could be supported.
>
> - print methods should return the main argument, so mine should be
>
> print.MultiMissing <- function(x, ...) {
> vals <- as.character(x)
> if (!is.character(x) || inherits(x, "noquote"))
> print(noquote(vals))
> else
> print(vals)
> invisible(x)
> }
>
> This still needs a lot of improvement to be a good print method, but
> I'll leave that to you.
>
> Duncan Murdoch
>
> On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
> > On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
> >> Dear Duncan,
> >>
> >> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <
> murdoch.duncan using gmail.com
> >> <mailto:murdoch.duncan using gmail.com>> wrote:
> >>
> >> You've already been told how to solve this: just add attributes
> to the
> >> objects. Use the standard NA to indicate that there is some kind of
> >> missingness, and the attribute to describe exactly what it is.
> Stick a
> >> class on those objects and define methods so that subsetting and
> >> arithmetic preserves the extra info you've added. If you do some
> >> operation that turns those NAs into NaNs, big deal: the attribute
> will
> >> still be there, and is.na <http://is.na>(NaN) still returns TRUE.
> >>
> >>
> >> I've already tried the attributes way, it is not so easy.
> >
> > If you have specific operations that are needed but that you can't get
> > to work, post the issue here.
> >
> >> In the best case scenario, it unnecessarily triples the size of the
> >> data, but perhaps this is the only way forward.
> >
> > I don't see how it could triple the size. Surely an integer has enough
> > values to cover all possible kinds of missingness. So on integer or
> > factor data you'd double the size, on real or character data you'd
> > increase it by 50%. (This is assuming you're on a 64 bit platform with
> > 32 bit integers and 64 bit reals and pointers.)
> >
> > Here's a tiny implementation to show what I'm talking about:
> >
> > asMultiMissing <- function(x) {
> > if (isMultiMissing(x))
> > return(x)
> > missingKind <- ifelse(is.na(x), 1, 0)
> > structure(x,
> > missingKind = missingKind,
> > class = c("MultiMissing", class(x)))
> > }
> >
> > isMultiMissing <- function(x)
> > inherits(x, "MultiMissing")
> >
> > missingKind <- function(x) {
> > if (isMultiMissing(x))
> > attr(x, "missingKind")
> > else
> > ifelse(is.na(x), 1, 0)
> > }
> >
> > `missingKind<-` <- function(x, value) {
> > class(x) <- setdiff(class(x), "MultiMissing")
> > x[value != 0] <- NA
> > x <- asMultiMissing(x)
> > attr(x, "missingKind") <- value
> > x
> > }
> >
> > `[.MultiMissing` <- function(x, i, ...) {
> > missings <- missingKind(x)
> > x <- NextMethod()
> > missings <- missings[i]
> > missingKind(x) <- missings
> > x
> > }
> >
> > print.MultiMissing <- function(x, ...) {
> > vals <- as.character(x)
> > if (!is.character(x) || inherits(x, "noquote"))
> > print(noquote(vals))
> > else
> > print(vals)
> > }
> >
> > `[<-.MultiMissing` <- function(x, i, value, ...) {
> > missings <- missingKind(x)
> > class(x) <- setdiff(class(x), "MultiMissing")
> > x[i] <- value
> > missings[i] <- missingKind(value)
> > missingKind(x) <- missings
> > x
> > }
> >
> > as.character.MultiMissing <- function(x, ...) {
> > missings <- missingKind(x)
> > result <- NextMethod()
> > ifelse(missings != 0,
> > paste0("NA.", missings), result)
> >
> > }
> >
> > This is incomplete. It doesn't do printing very well, and it doesn't
> > handle the case of assigning a MultiMissing value to a regular vector at
> > all. (I think you'd need an S4 implementation if you want to support
> > that.) But it does the basics:
> >
> > > x <- 1:10
> > > missingKind(x)[4] <- 23
> > > x
> > [1] 1 2 3 NA.23 5 6 7 8 9
> > [10] 10
> > > is.na(x)
> > [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
> > [10] FALSE
> > > missingKind(x)
> > [1] 0 0 0 23 0 0 0 0 0 0
> > >
> >
> > Duncan Murdoch
> >
> >>
> >> Base R doesn't need anything else.
> >>
> >> You complained that users shouldn't need to know about attributes,
> and
> >> they won't: you, as the author of the package that does this, will
> >> handle all those details. Working in your subject area you know
> all
> >> the
> >> different kinds of NAs that people care about, and how they code
> >> them in
> >> input data, so you can make it all totally transparent. If you do
> it
> >> well, someone in some other subject area with a completely
> different
> >> set
> >> of kinds of missingness will be able to adapt your code to their
> use.
> >>
> >>
> >> But that is the whole point: the package author does not define possible
> >> NAs (the possibilities are infinite), users do that.
> >> The package should only provide a simple method to achieve that.
> >>
> >>
> >> I imagine this has all been done in one of the thousands of
> packages on
> >> CRAN, but if it hasn't been done well enough for you, do it better.
> >>
> >>
> >> If it were, I would have found it by now...
> >>
> >> Best wishes,
> >> Adrian
> >
>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list