[Rd] 1954 from NA
Greg Minshall
m|n@h@|| @end|ng |rom um|ch@edu
Mon May 24 13:11:30 CEST 2021
Adrian,
> If it was only one column then your solution is neat. But with 5-600
> variables, each of which can contain multiple missing values, to
> double this number of variables just to describe NA values seems to me
> excessive. Not to mention we should be able to quickly convert /
> import / export from one software package to another. This would imply
> maintaining some sort of metadata reference of which explanatory
> additional factor describes which original variable.
one thing *i* should keep in mind is the old saying: "The difference
between theory and practice is that in theory there is no difference,
but in practice, there is."
but, in theory:
if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.
i guess the CS'y thing that comes to my mind here is that one thing is
the *semantics* of what you are trying to convey, and the other is how
those semantics are *encoded* in whatever representation you are using.
cheers, Greg
More information about the R-devel
mailing list