[R] how to find outliers from the list of values
Bert Gunter
gunter.berton at gene.com
Thu May 17 17:56:57 CEST 2012
Petr et. al:
FWIW (probably not much).
As you know, tens of thousands of pages about "outliers" have been
written by statisticians. IMHO, it is another of the really terrible
ideas of our discipline and has led to much scientific abuse, as
indicated by this posting. For this reason, I have eliminated it from
my vocabulary, using instead "unusual" or "unexpected" values, whose
meaning and purpose is pretty much as you described -- to bring the
user's attention to data issues that may require investigation and
intervention.
By eliminating the term, I feel it excises the notion that there can
somehow be statistical tests (alone) that can, irrespective of
scientific context, statistically identify "illegitimate" data. A
really dangerous and pernicious idea imho.
Best,
Bert
On Thu, May 17, 2012 at 6:44 AM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
> Hi
>
> I had not see any answer yet but maybe there is nobody who wants to touch
> the elusive object of "outlier". Neither me, but here are some ideas how
> one can proceed.
>
> First of all its always up to you what is considered an outlier and how
> will you deal with them.
>
> I usually call an outlier any item which does not fit to the pattern and
> the pattern is usually best observed by some plotting function. You can
> identify outlier points, inspect the data source, correct typing mistakes
> and only if the value is really measured and you can not find any reason
> why it has such value it is real outlier. Then ***you*** need to decide
> what to do with it - discard, can come from some long tailed distribution,
> ...
>
> So here are my 0.02$ regarding an outlier theme.
>
> Regards
> Petr
>
>>
>> Hi,
>> I am new to R and I would like to get your help in finding
>> 'outliers'.
>> I have mvoutlier package installed in my system and added the package .
>> But I not able find a function from 'mvoutlier' package which will
> identify
>> 'outliers'.
>> This is the sample list of data I have got which has one out-lier.
>> 11489 11008 11873 80000000 9558 8645 8024 8371 It will be of
>> great help if somebody have got an example script for the same.
>>
>> Thanks & Regards,
>> Thomas
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list