[R] How to replace outliers by group median?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Jun 16 16:14:14 CEST 2009


This is a sure way to get a biased variance estimate.

Instead, use a robust dispersion (scale) estimator such as Gini's mean 
difference (average absolute difference between any two observations). 
The median is a robust location estimator.  There are others.  If your 
ultimate goal is a comparison you can use a robust nonparametric test.

You'll find that the word 'outlier' is hard to define so it's best left 
undefined and unused.

Frank


Mao Jianfeng wrote:
> Dear R-helpers,
> 
> Very small amount of outliers can greatly affect the mean and many other
> statistic of a numeric variable. So, usually we must deal with the outliers
> properly in the process of data analysis. Here, I want to replace outliers
> with the group median of the variable. But, I can not construct a good way
> to do that efficiently,  because of I am a newbie to R and programming.
> 
> Can anybody share any R script to do that? I think that is also valuable to
> so many others who is doing numerical data analysis.
> 
> Here is a dummy dataframe with a group variable (three levels) and a numeric
> one. I just want to know how to replace outliers by group median.
> 
> population    conlen3
> YXPy01    8.6
> YXPy01    8.1
> YXPy01    7.6
> YXPy01    7.6
> YXPy01    23
> YXPy01    7.6
> YXPy01    7.6
> BSPy01    7.5
> BSPy01    6.4
> BSPy01    5.4
> BSPy01    15
> BSPy01    6.6
> BSPy01    5.5
> YLPy01    5.4
> YLPy01    5.4
> YLPy01    5.6
> YLPy01    21
> YLPy01    5.4
> YLPy01    5.4
> YLPy01    5.4
> YLPy01    4.9
> 
> Thank you a lot in advance.
> 
> Best regards,
> Mao  J-F
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list