[R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.
Marc Schwartz
marc_schwartz at me.com
Fri Sep 25 20:16:52 CEST 2009
Greg and John,
Just to throw it out there, the data sets here are small enough that
you co do a fully enumerable permutation test by replacing your
replicate() call with:
perms <- combn(17, 9, function(x) median(sets[x]) - median(sets[-x]))
This is based on an off-list communication that I had with Peter
Dalgaard about 3 years ago for a different scenario and gives you:
> choose(17, 9)
[1] 24310
permutations.
It does not take long:
> system.time(perms <- combn(17, 9, function(x) median(sets[x]) -
median(sets[-x])))
user system elapsed
3.863 0.019 3.898
Which yields:
> str(perms)
num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
> table(perms)
perms
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
285 175 2595 7000 8050 875 3720 1425 185
perms <- c(orig,perms)
prop.test( sum(perms>=orig), length(perms) )
# or
binom.test(sum(perms >= orig), length(perms))
# Variation on the graphic...
plot(table(perms), type = "h")
abline(v = orig, col = "blue")
See ?combn for more information.
HTH,
Marc Schwartz
On Sep 25, 2009, at 11:47 AM, Greg Snow wrote:
> Yes, I agree that the median makes the most sense here, but there
> could be other measures of location that would be of interest
> (quartiles, some version of the rank sum).
>
> Here is some sample code for a permutation test on the medians
> (there are a couple of packages that will do this as well, but this
> is pretty straight forward with straight R code):
>
> set1 <- c(1,3,2,2,4,3,3,2,2)
> set2 <- c(4,4,4,3,3,5,4,4)
>
> sets <- c(set1,set2)
> g1 <- seq_along(set1)
>
> orig <- median( sets[ -g1 ] ) - median( sets[ g1 ] )
>
> perms <- replicate( 1999, { tmp <- sample(sets)
> median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )
>
> # or
>
> pb <- winProgressBar(max=1999)
>
> setWinProgressBar(pb, 0)
>
> perms <- replicate(1999, { setWinProgressBar( pb,
> getWinProgressBar(pb) + 1 )
> tmp <- sample(sets)
> median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )
>
> close(pb)
>
>
> perms <- c(orig,perms)
> sum( perms >= orig )
> mean( perms >= orig )
> prop.test( sum(perms>=orig), length(perms) )
> hist(perms)
> abline(v=orig, col='blue')
>
> (if you want the progress bar on an os other than windows, then use
> the tcltk package and the tkProgressBar).
>
> Hope this helps,
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: John Sorkin [mailto:jsorkin at grecc.umaryland.edu]
>> Sent: Thursday, September 24, 2009 2:52 PM
>> To: Greg Snow; r-help at r-project.org
>> Subject: Re: [R] Non-parametric test for location with two unpaired
>> setsof data measured on ordinal scale.
>>
>> Greg,
>> I used the term location because I did not want to use the terms mean
>> or median for the exact reason that you gave; these to values can be
>> different in a given distribution. I want to test the null hypothesis
>> that the data come from a single distribution. This is often done by
>> comparing a measure of location (e.g. mean for ANOVOA), but as you
>> know
>> the mean need not be the only measure of location that is tested.
>> Giiven that my data are measured on an ordinal scale, the mean is
>> without meaning, so I suspect that the best measure for me would be a
>> comparison of medians, but I am open to other suggestions.
>> John
>>
>> John David Sorkin M.D., Ph.D.
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>
>>>>> Greg Snow <Greg.Snow at imail.org> 9/24/2009 4:30 PM >>>
>> What do you mean by location? I can think of examples where 2
>> distributions have the same median but different means, or the same
>> means but different medians.
>>
>> Are you willing to assume that the distributions are exactly the same
>> under the null hypothesis? (not just the same 'center/location')
>>
>> I would probably do a permutation test on the difference between the
>> means or medians (which ever you think is more meaningful), this
>> assumes the exact same distribution under the null.
>>
>> You can also do a Mann-Whitney/Wilcoxin test (but I don't like
>> explaining, or sometimes even thinking about, what it is actually
>> testing), you could do a bootstrap confidence interval on the
>> difference between means/medians (does not assume distributions are
>> the
>> same, just have same mean/median), or you could just replace all
>> values
>> by their ranks and do a t-test (essentially transforms the data to a
>> uniform distribution, the CLT for the uniform kicks in around n=5,
>> but
>> I would simulate just to check).
>>
>> This is not the nice simple answer that you were probably looking
>> for,
>> but hopefully it gives you some things to think about that will help,
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of John Sorkin
>>> Sent: Thursday, September 24, 2009 1:08 PM
>>> To: r-help at r-project.org
>>> Subject: [R] Non-parametric test for location with two unpaired sets
>> of
>>> data measured on ordinal scale.
>>>
>>> Please forgive a stats question.
>>>
>>> I have to sets of data (unpaired) measured on an ordinal scale. I
>> want
>>> to test to see if the two sets are different (i.e. do they have the
>>> same location):
>>>
>>> set1: 1,3,2,2,4,3,3,2,2
>>> set: 4,4,4,3,3,5,4,4
>>>
>>> What is the most appropriate non-parametric test to test location?
>>>
>>> Thanks,
>>> John
>>>
>>> Confidentiality Statement:
>>> This email message, including any attachments, is for
>>> th...{{dropped:6}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Confidentiality Statement:
>> This email message, including any attachments, is for ...{{dropped:
>> 6}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list