[R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.

Fri Sep 25 20:16:52 CEST 2009

Greg and John,

Just to throw it out there, the data sets here are small enough that  
you co do a fully enumerable permutation test by replacing your  
replicate() call with:

   perms <- combn(17, 9, function(x) median(sets[x]) - median(sets[-x]))

This is based on an off-list communication that I had with Peter  
Dalgaard about 3 years ago for a different scenario and gives you:

 > choose(17, 9)
[1] 24310

permutations.

It does not take long:

 > system.time(perms <- combn(17, 9, function(x) median(sets[x]) -  
median(sets[-x])))
    user  system elapsed
   3.863   0.019   3.898

Which yields:

 > str(perms)
  num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...

 > table(perms)
perms
   -2 -1.5   -1 -0.5    0  0.5    1  1.5    2
  285  175 2595 7000 8050  875 3720 1425  185

perms <- c(orig,perms)

prop.test( sum(perms>=orig), length(perms) )
# or
binom.test(sum(perms >= orig), length(perms))

# Variation on the graphic...
plot(table(perms), type = "h")
abline(v = orig, col = "blue")

See ?combn for more information.

HTH,

Marc Schwartz

On Sep 25, 2009, at 11:47 AM, Greg Snow wrote:

> Yes, I agree that the median makes the most sense here, but there  
> could be other measures of location that would be of interest  
> (quartiles, some version of the rank sum).
>
> Here is some sample code for a permutation test on the medians  
> (there are a couple of packages that will do this as well, but this  
> is pretty straight forward with straight R code):
>
> set1 <- c(1,3,2,2,4,3,3,2,2)
> set2 <- c(4,4,4,3,3,5,4,4)
>
> sets <- c(set1,set2)
> g1 <- seq_along(set1)
>
> orig <- median( sets[ -g1 ] ) - median( sets[ g1 ] )
>
> perms <- replicate( 1999, { tmp <- sample(sets)
> 	median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )
>
> #  or
>
> pb <- winProgressBar(max=1999)
>
> setWinProgressBar(pb, 0)
>
> perms <- replicate(1999, { setWinProgressBar( pb,  
> getWinProgressBar(pb) + 1 )
> 	tmp <- sample(sets)
> 	median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )
>
> close(pb)
>
>
> perms <- c(orig,perms)
> sum( perms >= orig )
> mean( perms >= orig )
> prop.test( sum(perms>=orig), length(perms) )
> hist(perms)
> abline(v=orig, col='blue')
>
> (if you want the progress bar on an os other than windows, then use  
> the tcltk package and the tkProgressBar).
>
> Hope this helps,
>
>
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: John Sorkin [mailto:jsorkin at grecc.umaryland.edu]
>> Sent: Thursday, September 24, 2009 2:52 PM
>> To: Greg Snow; r-help at r-project.org
>> Subject: Re: [R] Non-parametric test for location with two unpaired
>> setsof data measured on ordinal scale.
>>
>> Greg,
>> I used the term location because I did not want to use the terms mean
>> or median for the exact reason that you gave; these to values can be
>> different in a given distribution. I want to test the null hypothesis
>> that the data come from a single distribution. This is often done by
>> comparing a measure of location (e.g. mean for ANOVOA), but as you  
>> know
>> the mean need not be the only measure of location that is tested.
>> Giiven that my data are measured on an ordinal scale, the mean is
>> without meaning, so I suspect that the best measure for me would be a
>> comparison of medians, but I am open to other suggestions.
>> John
>>
>> John David Sorkin M.D., Ph.D.
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>
>>>>> Greg Snow <Greg.Snow at imail.org> 9/24/2009 4:30 PM >>>
>> What do you mean by location?  I can think of examples where 2
>> distributions have the same median but different means, or the same
>> means but different medians.
>>
>> Are you willing to assume that the distributions are exactly the same
>> under the null hypothesis? (not just the same 'center/location')
>>
>> I would probably do a permutation test on the difference between the
>> means or medians (which ever you think is more meaningful), this
>> assumes the exact same distribution under the null.
>>
>> You can also do a Mann-Whitney/Wilcoxin test (but I don't like
>> explaining, or sometimes even thinking about, what it is actually
>> testing), you could do a bootstrap confidence interval on the
>> difference between means/medians (does not assume distributions are  
>> the
>> same, just have same mean/median), or you could just replace all  
>> values
>> by their ranks and do a t-test (essentially transforms the data to a
>> uniform distribution, the CLT for the uniform kicks in around n=5,  
>> but
>> I would simulate just to check).
>>
>> This is not the nice simple answer that you were probably looking  
>> for,
>> but hopefully it gives you some things to think about that will help,
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of John Sorkin
>>> Sent: Thursday, September 24, 2009 1:08 PM
>>> To: r-help at r-project.org
>>> Subject: [R] Non-parametric test for location with two unpaired sets
>> of
>>> data measured on ordinal scale.
>>>
>>> Please forgive a stats question.
>>>
>>> I have to sets of data (unpaired) measured on an ordinal scale. I
>> want
>>> to test to see if the two sets are different (i.e. do they have the
>>> same location):
>>>
>>> set1: 1,3,2,2,4,3,3,2,2
>>> set:   4,4,4,3,3,5,4,4
>>>
>>> What is the most appropriate non-parametric test to test location?
>>>
>>> Thanks,
>>> John
>>>
>>> Confidentiality Statement:
>>> This email message, including any attachments, is for
>>> th...{{dropped:6}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Confidentiality Statement:
>> This email message, including any attachments, is for ...{{dropped: 
>> 6}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.