[R] significance test interquartile ranges
Rui Barradas
ruipbarradas at sapo.pt
Sat Jul 14 12:25:49 CEST 2012
Hello,
There's a test for iqr equality, of Westenberg (1948), that can be found
on-line if one really looks. It starts creating a 1 sample pool from the
two samples and computing the 1st and 3rd quartiles. Then a three column
table where the rows correspond to the samples is built. The middle
column is the counts between the quartiles and the side ones to the
outsides. These columns are collapsed into one and a Fisher exact test
is conducted on the 2x2 resulting table.
R code could be:
iqr.test <- function(x, y){
qq <- quantile(c(x, y), prob = c(0.25, 0.75))
a <- sum(qq[1] < x & x < qq[2])
b <- length(x) - a
c <- sum(qq[1] < y & y < qq[2])
d <- length(y) - b
m <- matrix(c(a, c, b, d), ncol = 2)
numer <- sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2))))
denom <- sum(lfactorial(c(a, b, c, d, sum(m))))
p.value <- 2*exp(numer - denom)
data.name <- deparse(substitute(x))
data.name <- paste(data.name, ", ", deparse(substitute(y)), sep="")
method <- "Westenberg-Mood test for IQR range equality"
alternative <- "the IQRs are not equal"
ht <- list(
p.value = p.value,
method = method,
alternative = alternative,
data.name = data.name
)
class(ht) <- "htest"
ht
}
n <- 1e3
pv <- numeric(n)
set.seed(2319)
for(i in 1:n){
x <- rnorm(sample(20:30, 1), 4, 1)
y <- rchisq(sample(20:40, 1), df=4)
pv[i] <- iqr.test(x, y)$p.value
}
sum(pv < 0.05)/n # 0.8
Hope this helps,
Rui Barradas
Em 14-07-2012 09:01, peter dalgaard escreveu:
>
> On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote:
>
>> On 13/07/2012 21:37, Greg Snow wrote:
>>> A permutation test may be appropriate:
>>
>> Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed.
>>
>> I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape).
>
> (Brian knows this, of course, but I though it useful to insert a little quibbling.)
>
> "Sensitive" is perhaps a little misleading here. The test statistic in the Wilcoxon test is essentially an estimate of the probability that a random observation in one group is bigger than a random observation in the other group. It isn't hard to imagine situation where that quantity is unaffected by a dispersion change so the test is not sensitive in the sense that it can detect dispersion changes between sufficiently large samples.
>
> However, the point is that p values _rely on_ the null hypothesis that two distributions are exactly the same. This is mostly uncontroversial if you are testing for an irrelevant grouping, but if you need confidence intervals for the difference, you are implicitly assuming a location-shift model.
>
> The same thing is true for permutation tests in general: You need to be rather careful about what the assumptions are that allows you to interchange things. Asymptotically, the distribution of the IQR depends on the values of the density at the true quartiles. These could be different in the two groups, and easily completely unrelated to those of a pooled sample.
>
> I think that I would suggest finding an error estimate for the IQR (or maybe log IQR) in each group separately, perhaps by bootstrapping, and then compare between groups with an asymptotic z test. The main caveat is whether you have sufficiently large sample sizes for asymptotics to hold.
>
> Peter D.
>
>>
>> I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question.
>>
>>> 1. compute the ratio of the 2 IQR values (or other comparison of interest)
>>> 2. combine the data from the 2 samples into 1 pool, then randomly
>>> split into 2 groups (matching sample sizes of original) and compute
>>> the ratio of the IQR values for the 2 new samples.
>>> 3. repeat #2 a bunch of times (like for a total of 999 random splits)
>>> and combine with the original value.
>>> 4. (optional, but strongly suggested) plot a histogram of all the
>>> ratios and place a reference line of the original ratio on the plot.
>>> 5. calculate the proportion of ratios that are as extreme or more
>>> extreme than the original, this is the (approximate) p-value.
>>
>> I think it is an 'exact' (but random) p-value.
>>
>>>
>>> On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
>>> <joerg.schaber at med.ovgu.de> wrote:
>>>> Hi,
>>>>
>>>> I have two non-normal distributions and use interquartile ranges as a dispersion measure.
>>>> Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different.
>>>> Any idea?
>>>>
>>>> Thanks,
>>>>
>>>> joerg
>>>>
>>
>>
>> --
>> Brian D. Ripley, ripley at stats.ox.ac.uk
>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel: +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list