[Rd] 1954 from NA

Greg Minshall m|n@h@|| @end|ng |rom um|ch@edu
Mon May 24 13:11:30 CEST 2021


Adrian,

> If it was only one column then your solution is neat. But with 5-600
> variables, each of which can contain multiple missing values, to
> double this number of variables just to describe NA values seems to me
> excessive.  Not to mention we should be able to quickly convert /
> import / export from one software package to another. This would imply
> maintaining some sort of metadata reference of which explanatory
> additional factor describes which original variable.

one thing *i* should keep in mind is the old saying: "The difference
between theory and practice is that in theory there is no difference,
but in practice, there is."

but, in theory:

if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.

i guess the CS'y thing that comes to my mind here is that one thing is
the *semantics* of what you are trying to convey, and the other is how
those semantics are *encoded* in whatever representation you are using.

cheers, Greg



More information about the R-devel mailing list