[R] wilcox.test; data type conversion?
Steven McKinney
smckinney at bccrc.ca
Fri Oct 29 06:24:29 CEST 2010
You can set up the data as
> grade <- ordered(c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG"), levels = c("G", "VG", "MVG"))
> grade
[1] MVG VG VG G MVG G VG G VG
Levels: G < VG < MVG
> sex <- factor(c( "male", "male", "female", "male", "female", "male", "female", "male", "male"), levels = c("male", "female"))
> sex
[1] male male female male female male female male male
Levels: male female
> gradesbysex <- data.frame(grade, sex)
>
> gradesbysex
grade sex
1 MVG male
2 VG male
3 VG female
4 G male
5 MVG female
6 G male
7 VG female
8 G male
9 VG male
Now for the Wilcoxon-Mann_Whitney test
> wilcox.test(grade ~ sex, data = gradesbysex)
Error in wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L, :
'x' must be numeric
I'm not sure if anyone has written a version that will work on ordered factor variables,
but you can coerce the ordered factor to its underlying integer representation with e.g.
> wilcox.test(as.integer(grade) ~ sex, data = gradesbysex)
Wilcoxon rank sum test with continuity correction
data: as.integer(grade) by sex
W = 4.5, p-value = 0.2695
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L, :
cannot compute exact p-value with ties
You can break the ties by jittering the data. Each jitter will of course
produce different tie breakers. A few repeats of the test, or a loop and
some summaries of the outcomes, will give you an idea of the
"average" result.
> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test
data: jitter(as.integer(grade)) by sex
W = 4, p-value = 0.2619
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test
data: jitter(as.integer(grade)) by sex
W = 3, p-value = 0.1667
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test
data: jitter(as.integer(grade)) by sex
W = 7, p-value = 0.7143
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test
data: jitter(as.integer(grade)) by sex
W = 6, p-value = 0.5476
alternative hypothesis: true location shift is not equal to 0
I'll let you judge elegance.
As for the barplots, I think all you need to do is specify the row and column order you'd like.
Try this example
> barplot(VADeaths, beside = TRUE)
> barplot(VADeaths[5:1,c(4, 2, 3, 1)], beside = TRUE)
Substitute your data, use beside=FALSE to stack, etc.
Steven McKinney
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Par Leijonhufvud [par at hunter-gatherer.org]
Sent: October 28, 2010 8:37 PM
To: rhelp
Subject: [R] wilcox.test; data type conversion?
I'm working on a quick tutorial for my students, and was planning on
using Mann-Whitney U as one of the tests.
I have the following (fake) data
grade <- c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG")
sex <- c( "male", "male", "female", "male", "female", "male", "female", "male", "male")
gradesbysex <- data.frame(grade, sex)
The grades is in the Swedish system, where the order is G < VG < MVG
The idea is that they will investigate if they can show a grade
difference by sex (i.e. that the teacher gives better grades to boys or
girls).
Since the wilcox.test needs the order of the grades it wants numeric
vector for the data. Is there a good and simple (i.e. student
compatible) way to handle this? I could tell them to enter data as
numbers instead, but an elegant way to do this inside R would be
preferable.
On the same theme, is there a way to tell barplot that, when making
stacked barplots, to stack the data in a particular order (default
appears to be alphabetical)?
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list