[R] Outlier removal techniques

Rich Shepard rshepard at appl-ecosys.com
Thu Feb 9 17:55:08 CET 2012


On Thu, 9 Feb 2012, mails wrote:

> I need to analyse a data matrix with dimensions of 30x100. Before
> analysing the data there is, however, a need to remove outliers from the
> data. I read quite a lot about outlier removal already and I think the
> most common technique for that seems to be Principal Component Analysis
> (PCA). However, I think that these technqiue is quite subjective. When is
> an outlier an outlier? I uploaded an example PCA plot here:

   Those more expert than I will certainly provide answers. What I do will
new data is create box-and-whisker plots (I use the lattice package) which
defines outliers as those data beyond 1.5x the first or third quartile
values.

   No one but you can answer your question on when an outlier is an outlier.
It depends on your data set and the context of the data. For example, a
water chemistry value that far exceeds a regulartory threshold might be
meaningful in the context of a one-off excursion (in which case it's not an
outlier but a real data point) or it might result from a handling,
instrumentation, or analytical error (in which case toss it as an outlier).

Rich



More information about the R-help mailing list