[R] handling NA by mean replacement

James Reilly reilly at stat.auckland.ac.nz
Tue Jan 31 07:23:27 CET 2006


Here are a couple of documents that make much the same point (e.g. "mean
value imputation is not recommended"), and discuss several alternatives.

http://nces.ed.gov/statprog/2002/appendixb3.asp
http://www2.chass.ncsu.edu/garson/pa765/missing.htm

I think we'd need more information on the context to provide any real
advice. Another possible source of help is the Impute mailing list:
http://lists.utsouthwestern.edu/mailman/listinfo/impute

Cheers,
James
-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

On 31/01/2006 6:20 a.m., Berton Gunter wrote:
> Lots of other folks will give you the simple answer (hint: ?'['  ?is.na)
> 
> Yours is one of those "iceberg" questions  -- 2/3 hidden underwater.
> 
> Two points:
> 
> Point 1: Generally you **don't have to do such replacement** as most of R's
> functions have a na.rm or na.action argument (unfortunately, for historical
> reasons, the argument names and meanings aren't consistent) that does
> basically what you want anyway.
> 
> Point 2: Doing what you ask is probably a bad idea, as it creates mythical
> degrees of freedom and biases results --> gives wrong statistical answers.
> 
> As a general matter, handling missing values "correctly" is a difficult
> statistical issue that you may want to avoid if you can (R has plenty of
> packages that can deal with it, but it requires background expertise).
> Honestly, I'm not sure "if you can" makes any sense here (how do you know?),
> but let's just say that I think your potential for mischief is reduced if
> you use R's inbuilt arguments for ignoring missings rather than imputing
> them naively.
> 
> Having said that, I believe that clustering procedures, for example, may not
> permit this (but they have builtin missing imputation capabilities of their
> own, do they not?), so you may have to impute. In this case, try to do so
> wisely (e.g. via multiple imputation?). 
> 
> Perhaps this will stimulate real experts to offer you some advice. Good
> luck.
> 
> Cheers,
> Bert
>  
> Bert Gunter
> Genentech
> 
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch 
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Julie Bernauer
>> Sent: Monday, January 30, 2006 8:50 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] handling NA by mean replacement
>>
>> Hello
>>
>> I am sorry fuch such a stupid question. Suppose I have a 
>> table of data having a
>> lot of NAs and I want to replace those NAs by the mean of the 
>> column before NA
>> replacement. How is it possible to do that efficiently ?
>>
>> Thanks in advance,
>>
>> Julie
>>
>> -- 
>> Julie Bernauer
>> Yeast Structural Genomics
>> http://www.genomics.eu.org
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list