[R] how to identify the outliers

Liaw, Andy andy_liaw at merck.com
Tue Nov 26 17:42:52 CET 2002


First, a bit of retoric:

Rejecting data points solely on the ground of being "statistical outliers"
probably should be outlawed.  (Sounds like that's what you're trying to do.)
You need to investigate these data points so that you understand the reason
for their "outlyingness" before you decide whether their exclusion make
sense or not.  Exclusion based purely on statistical criteria almost
guarantee irreproducible research.

Some explanation:  Any "statistical outliers" are with reference to a model
(formal or conceptual).  They reflect lack of fit of the model to the data.
Rejecting these data points on statistical ground means you believe the
model more than the data, which not such a good idea for scientific
research.

The "outliers" indicated by boxplots are based on a criterion (something
like the upper/lower hinges +/- k*IQR, where k is either 1.5 or 3, see
?boxplot.stats for some definitions).  Actually boxplot.stats gives you the
limits boxplot() used to identify outliers.

Andy

-----Original Message-----
From: Rado Bonk [mailto:rbonk at host.sk]
Sent: Tuesday, November 26, 2002 10:36 AM
To: r-help at stat.math.ethz.ch
Subject: [R] how to identify the outliers


Hello R-users,

Is there any more sophisticated way how to identify the dataset 
outliers other then seeing them in boxplot? I wanna exclude them from
further analysis and I am interested in their position in my vector
data.

Rado

-- 
Radoslav Bonk M.S.
Dept. of Physical Geography and Geoecology
Faculty of Sciences, Comenius University
Mlynska Dolina 842 15, Bratislava, SLOVAKIA
tel: +421 2 602 96 250 e-mail: rbonk at host.sk
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message.  If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list