[R] handling NA by mean replacement
reilly at stat.auckland.ac.nz
Tue Jan 31 07:23:27 CET 2006
Here are a couple of documents that make much the same point (e.g. "mean
value imputation is not recommended"), and discuss several alternatives.
I think we'd need more information on the context to provide any real
advice. Another possible source of help is the Impute mailing list:
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand
On 31/01/2006 6:20 a.m., Berton Gunter wrote:
> Lots of other folks will give you the simple answer (hint: ?'[' ?is.na)
> Yours is one of those "iceberg" questions -- 2/3 hidden underwater.
> Two points:
> Point 1: Generally you **don't have to do such replacement** as most of R's
> functions have a na.rm or na.action argument (unfortunately, for historical
> reasons, the argument names and meanings aren't consistent) that does
> basically what you want anyway.
> Point 2: Doing what you ask is probably a bad idea, as it creates mythical
> degrees of freedom and biases results --> gives wrong statistical answers.
> As a general matter, handling missing values "correctly" is a difficult
> statistical issue that you may want to avoid if you can (R has plenty of
> packages that can deal with it, but it requires background expertise).
> Honestly, I'm not sure "if you can" makes any sense here (how do you know?),
> but let's just say that I think your potential for mischief is reduced if
> you use R's inbuilt arguments for ignoring missings rather than imputing
> them naively.
> Having said that, I believe that clustering procedures, for example, may not
> permit this (but they have builtin missing imputation capabilities of their
> own, do they not?), so you may have to impute. In this case, try to do so
> wisely (e.g. via multiple imputation?).
> Perhaps this will stimulate real experts to offer you some advice. Good
> Bert Gunter
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Julie Bernauer
>> Sent: Monday, January 30, 2006 8:50 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] handling NA by mean replacement
>> I am sorry fuch such a stupid question. Suppose I have a
>> table of data having a
>> lot of NAs and I want to replace those NAs by the mean of the
>> column before NA
>> replacement. How is it possible to do that efficiently ?
>> Thanks in advance,
>> Julie Bernauer
>> Yeast Structural Genomics
>> R-help at stat.math.ethz.ch mailing list
>> PLEASE do read the posting guide!
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help