# [R] outliers/interval data extraction

Christian Hennig hennig at stat.math.ethz.ch
Thu Feb 20 18:55:03 CET 2003

```Hi,

the boxplot is based on the quartiles which are much less outlier sensitive
than mean and SD and should therefore not be "heavily distorted by
outliers". What you mean is presumably that you see the area of the main
bulk of the data only as a very small box on the screen because of your
outliers.
However, a simple straight forward method for outlier identification is
median +/- 5.2*mad as suggested by Hampel, Technometrics 27 (1985) 95-107.
Outlier identification by use of mean and SD is often bad because these
statistics are strongly influenced by the outliers.

x <- data vector
medx <- median(x)
selected <- x[!outliers]

Best,
Christian

On 20 Feb 2003, Rado Bonk wrote:

> Dear R-users,
>
> I have two outliers related questions.
>
> I.
> I have a vector consisting of 69 values.
>
> mean = 0.00086
> SD = 0.02152
>
> The shape of EDA graphics (boxplots, density plots) is heavily distorted
> due to outliers. How to define the interval for outliers exception? Is
> <2SD - mean + 2SD> interval a correct approach?
>
> Or should I define 95% (or 99%) limit of agreement for data interval,
> and exclude lower, and higher values?
>
> II.
> How to extract only those values from vector which fulfill the condition
> of interval (higher than A, and lower than B)?
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

--
***********************************************************************
Christian Hennig
Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently)
and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at stat.math.ethz.ch, http://stat.ethz.ch/~hennig/
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag.de

```