[R] dixon test

giov biowoman at libero.it
Wed Aug 13 11:59:32 CEST 2008


Hi,
thank you very much for your useful help =). just a question...I don't know
what is the distribution of my data (normal, T, etc...). So, how can I set
the type parameter? There is a type value to use in case of a
distribution-free statistical test? 

Thank you so much!


Fernando Marmolejo-Ramos wrote:
> 
> hi giov
> 
> about the dixon test... i just run a simple test with a sample of 40 and I
> got:
> 
> Error in dixon.test(x) : Sample size must be in range 3-30
> 
> So it seems that most of the test in the "outliers" package are designed
> for small samples. See also the Rnews article published in May 2006 (vol
> 6/2) called "processing data for outliers" by Lukasz Komsta (the developer
> of the package).
> 
> However there is in that package a function called "scores" which works
> for big samples. You can also see the p-values and z scores for the
> observations you have and determine which values are considered outliers.
> 
> Try this simple syntax:
> 
> library(outliers)
> library(gamlss.dist)
> 
> # this produces a exponential+Gaussian distribution (which usually has
> heaps of outliers!)
> x <- rexGAUS(100,2000,3000,5000)
> 
> # this confirms that Dixon works for samples between 3 and 30!!!
> dixon.test(x)
> 
> # just to see what the data set looks like and visually confirm the
> outliers
> boxplot(x, notch=T)
> 
> # sort the scores in ascending order
> sort(x)
> 
> # returns probability of each score (using z scores) to be an outlier in
> order
> sort(scores(x, type="z", prob=1))
> 
> # determines which scores are considered outliers with a 95% confidence
> sort(scores(x, prob=0.95))
> 
> The author points regarding the "prob" part...
> 
> prob ---- If set, the corresponding p-values instead of scores are given.
> If value is set to 1, p-value are returned. Otherwise, a logical vector is
> formed, indicating which values are exceeding specified probability. In
> "z" and "mad" types, there is also possibility to set this value to zero,
> and then scores are confirmed to (n-1)/sqrt(n) value, according to
> Shiffler (1998). The "iqr" type does not support probabilities, but "lim"
> value can be specified. 
> 
> The reference of Shiffler is not as the one that appears in the help. It
> is this one:
> 
> Schiffler, R.E (1988). Maximum Z scores and outliers. Am. Stat. 42, 1,
> 79-80. 
> 
> I hope this helps,
> 
> Fernando
> 
> 

-- 
View this message in context: http://www.nabble.com/dixon-test-tp18940260p18960162.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list