[R] Outlier statistics question

Wed Dec 1 04:53:43 CET 2010

It is, perhaps, more apt to call the tests of outliers as "tests of outright liars".

"Lies, damned lies, and tests of outliers"

Ravi. 
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
Date: Tuesday, November 30, 2010 4:22 pm
Subject: Re: [R] Outlier statistics question
To: Jahan <jahan.mohiuddin at gmail.com>
Cc: r-help at r-project.org

> (Apologies to all. I am weak and could not resist)
> 
> On Tue, Nov 30, 2010 at 12:15 PM, Jahan <jahan.mohiuddin at gmail.com> wrote:
> > I have a statistical question.
> > The data sets I am working with are right-skewed so I have been
> > plotting the log transformations of my data.  I am using a Grubbs Test
> > to detect outliers in the data, but I get different outcomes depending
> > on whether I run the test on the original data or the log(data).
> 
> Of course!
> 
> Here
> > is one of the problematic sets:
> >
> > fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
> > stripchart(fgf2p50,vertical=TRUE)
> > #This next step requires you have the 'outliers' package
> > library(outliers)
> > grubbs.test(fgf2p50)
> > #the output says p<0.05 so 5.047 is an outlier
> > #Next, I run the test on the log(data)
> > log10=c(0.194,0.335,0.403,0.436,0.388,0.703)
> > grubbs.test(log10)
> > #output is that p>0.05 so we reject that there is an outlier.
> >
> > The question is, which outlier test do I accept?
> 
> Neither.
> 
> (IMHO) Outlier tests are one of statistics's _bad ideas._ The Grubbs
> test is ca 1970 . There are many better approaches these days --
> consult your local statistician -- all of which will depend on
> answering the question,  "What is the question you are trying to
> answer?"
> 
> -- Bert
> 
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > 
> > PLEASE do read the posting guide 
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> -- 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> ______________________________________________
> R-help at r-project.org mailing list
> 
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.