[R] wilcox.test; data type conversion?

Steven McKinney smckinney at bccrc.ca
Fri Oct 29 06:24:29 CEST 2010


You can set up the data as

> grade <- ordered(c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG"), levels = c("G", "VG", "MVG"))
> grade
[1] MVG VG  VG  G   MVG G   VG  G   VG 
Levels: G < VG < MVG
> sex <- factor(c( "male", "male", "female", "male", "female", "male", "female", "male", "male"), levels = c("male", "female"))
> sex
[1] male   male   female male   female male   female male   male  
Levels: male female
> gradesbysex <- data.frame(grade, sex)
> 
> gradesbysex
  grade    sex
1   MVG   male
2    VG   male
3    VG female
4     G   male
5   MVG female
6     G   male
7    VG female
8     G   male
9    VG   male

Now for the Wilcoxon-Mann_Whitney test

> wilcox.test(grade ~ sex, data = gradesbysex)
Error in wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  : 
  'x' must be numeric

I'm not sure if anyone has written a version that will work on ordered factor variables,
but you can coerce the ordered factor to its underlying integer representation with e.g.

> wilcox.test(as.integer(grade) ~ sex, data = gradesbysex)

	Wilcoxon rank sum test with continuity correction

data:  as.integer(grade) by sex 
W = 4.5, p-value = 0.2695
alternative hypothesis: true location shift is not equal to 0 

Warning message:
In wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  :
  cannot compute exact p-value with ties

You can break the ties by jittering the data.  Each jitter will of course
produce different tie breakers.  A few repeats of the test, or a loop and
some summaries of the outcomes, will give you an idea of the
"average" result.

> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)

	Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 4, p-value = 0.2619
alternative hypothesis: true location shift is not equal to 0 

> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)

	Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 3, p-value = 0.1667
alternative hypothesis: true location shift is not equal to 0 

> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)

	Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 7, p-value = 0.7143
alternative hypothesis: true location shift is not equal to 0 

> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)

	Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 6, p-value = 0.5476
alternative hypothesis: true location shift is not equal to 0 


I'll let you judge elegance.


As for the barplots, I think all you need to do is specify the row and column order you'd like.

Try this example

> barplot(VADeaths, beside = TRUE)
> barplot(VADeaths[5:1,c(4, 2, 3, 1)], beside = TRUE)

Substitute your data, use beside=FALSE to stack, etc.

Steven McKinney

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Par Leijonhufvud [par at hunter-gatherer.org]
Sent: October 28, 2010 8:37 PM
To: rhelp
Subject: [R] wilcox.test; data type conversion?

I'm working on a quick tutorial for my students, and was planning on
using Mann-Whitney U as one of the tests.

I have the following (fake) data

 grade <- c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG")
 sex <- c( "male", "male", "female", "male", "female", "male", "female", "male", "male")
 gradesbysex <- data.frame(grade, sex)

The grades is in the Swedish system, where the order is G < VG < MVG

The idea is that they will investigate if they can show a grade
difference by sex (i.e. that the teacher gives better grades to boys or
girls).

Since the wilcox.test needs the order of the grades it wants numeric
vector  for the data. Is there a good and simple (i.e. student
compatible) way to handle this? I could tell them to enter data as
numbers instead, but an elegant way to do this inside R would be
preferable.


On the same theme, is there a way to tell barplot that, when making
stacked barplots, to stack the data in a particular order (default
appears to be alphabetical)?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list