[R] Non-parametric test for location with two unpaired setsof data measured on ordinal scale.

Greg Snow Greg.Snow at imail.org
Fri Sep 25 18:47:10 CEST 2009


Yes, I agree that the median makes the most sense here, but there could be other measures of location that would be of interest (quartiles, some version of the rank sum).

Here is some sample code for a permutation test on the medians (there are a couple of packages that will do this as well, but this is pretty straight forward with straight R code):

set1 <- c(1,3,2,2,4,3,3,2,2)
set2 <- c(4,4,4,3,3,5,4,4)

sets <- c(set1,set2)
g1 <- seq_along(set1)

orig <- median( sets[ -g1 ] ) - median( sets[ g1 ] )

perms <- replicate( 1999, { tmp <- sample(sets)
	median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

#  or

pb <- winProgressBar(max=1999)

setWinProgressBar(pb, 0)

perms <- replicate(1999, { setWinProgressBar( pb, getWinProgressBar(pb) + 1 )
	tmp <- sample(sets)
	median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

close(pb)


perms <- c(orig,perms)
sum( perms >= orig )
mean( perms >= orig )
prop.test( sum(perms>=orig), length(perms) )
hist(perms)
abline(v=orig, col='blue')

(if you want the progress bar on an os other than windows, then use the tcltk package and the tkProgressBar).

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: John Sorkin [mailto:jsorkin at grecc.umaryland.edu]
> Sent: Thursday, September 24, 2009 2:52 PM
> To: Greg Snow; r-help at r-project.org
> Subject: Re: [R] Non-parametric test for location with two unpaired
> setsof data measured on ordinal scale.
> 
> Greg,
> I used the term location because I did not want to use the terms mean
> or median for the exact reason that you gave; these to values can be
> different in a given distribution. I want to test the null hypothesis
> that the data come from a single distribution. This is often done by
> comparing a measure of location (e.g. mean for ANOVOA), but as you know
> the mean need not be the only measure of location that is tested.
> Giiven that my data are measured on an ordinal scale, the mean is
> without meaning, so I suspect that the best measure for me would be a
> comparison of medians, but I am open to other suggestions.
> John
> 
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> 
> >>> Greg Snow <Greg.Snow at imail.org> 9/24/2009 4:30 PM >>>
> What do you mean by location?  I can think of examples where 2
> distributions have the same median but different means, or the same
> means but different medians.
> 
> Are you willing to assume that the distributions are exactly the same
> under the null hypothesis? (not just the same 'center/location')
> 
> I would probably do a permutation test on the difference between the
> means or medians (which ever you think is more meaningful), this
> assumes the exact same distribution under the null.
> 
> You can also do a Mann-Whitney/Wilcoxin test (but I don't like
> explaining, or sometimes even thinking about, what it is actually
> testing), you could do a bootstrap confidence interval on the
> difference between means/medians (does not assume distributions are the
> same, just have same mean/median), or you could just replace all values
> by their ranks and do a t-test (essentially transforms the data to a
> uniform distribution, the CLT for the uniform kicks in around n=5, but
> I would simulate just to check).
> 
> This is not the nice simple answer that you were probably looking for,
> but hopefully it gives you some things to think about that will help,
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of John Sorkin
> > Sent: Thursday, September 24, 2009 1:08 PM
> > To: r-help at r-project.org
> > Subject: [R] Non-parametric test for location with two unpaired sets
> of
> > data measured on ordinal scale.
> >
> > Please forgive a stats question.
> >
> > I have to sets of data (unpaired) measured on an ordinal scale. I
> want
> > to test to see if the two sets are different (i.e. do they have the
> > same location):
> >
> > set1: 1,3,2,2,4,3,3,2,2
> > set:   4,4,4,3,3,5,4,4
> >
> > What is the most appropriate non-parametric test to test location?
> >
> > Thanks,
> > John
> >
> > Confidentiality Statement:
> > This email message, including any attachments, is for
> > th...{{dropped:6}}
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:6}}




More information about the R-help mailing list