[R] nonparametric multiple sample comparison
aolinto_r@bignet.com.br
aolinto_r at bignet.com.br
Sun Apr 25 17:52:36 CEST 2004
Hello all,
Here goes one of my first functions.
I want to make a nonparametric multiple sample comparison with unequal sample
sizes (see Zars Biostatistical Analysis, 3rd. Ed., pg. 201 Example 10.11, pg.
288 Example 11.10). In the real world, I want to compare samples of fish
length captured with different fishing gears.
After using the Kruskal-Wallis test I want to check the differences between
each two samples.
I wrote the function multcomp (see below) and its working OK but I still have
some doubts. To use it one must have a two-column dataframe with groups and
measurements.
1) In line 20 (Results <- data.frame....) I create a dataframe to store the final
results. It has a fixed row number but it would be better to have a variable
number, i.e., the number of combination of the groups taken 2 by 2 (4 groups =
5 combinations, 6 groups = 15 combinations). Which function in R returns the
number of combinations? I couldnt find it!
2) In lines 35 to 39 I print the result tables. How can I make a "clean"
print, without row numbers and quotation marks?
3) Also Id like to calculate the critical values "q" and p-values but I
couldnt find the distribution indicated in Zars Table B.15
(App107) "Critical Values of Q for Nonparametric Multiple Comparison Testing".
Is it possible to calculate these numbers in R?
Thanks in advance for any help or comments to improve this function.
Best regards,
Antonio Olinto
Sao Paulo Fisheries Institute
Brazil
multcomp <- function(DataSet) {
dat.multcomp <- DataSet
names(dat.multcomp) <- c("VarCat","VarNum")
attach(dat.multcomp)
dat.multcomp$Rank <- rank(VarNum)
attach(dat.multcomp)
RankList <- aggregate(Rank,list(Rank=Rank),FUN=length)
t <- length(RankList$Rank)
st <- 0
for (i in 1:t) if (RankList[i,2]>1) st<-st+(RankList[i,2]^3-RankList[i,2])
LevCat <- levels(dat.multcomp$VarCat)
NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length)
RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum)
MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean)
SampleSummary <- data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2])
names(SampleSummary)<-c("Samples","RSum","N","RMean")
SampleSummary <- SampleSummary[order(SampleSummary$RMean,decreasing=T),]
NCat <- length(LevCat)
N <- length(dat.multcomp$VarCat)
Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6))
names(Results) <- c("Comparison","Difference","SE","Q")
l <- 1
for (i in 1:(NCat-1)) {
for (j in NCat:(i+1)) {
SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+
(1/SampleSummary[j,3])))
Dif <- SampleSummary[i,4]-SampleSummary[j,4]
Q=Dif/SE
Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1])
Results[l,2] <- round(Dif,4)
Results[l,3] <- round(SE,4)
Results[l,4] <- round(Q,4)
l <-l+1
}
}
print("Sample summary ranked by mean ranks")
print(SampleSummary)
print("")
print("Table of multiple comparisons")
print(Results)
}
===========
Data example taken from Zar (Biostatiscical Analysis):
dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0), sep=" "))
1 7.68
1 7.69
1 7.7
1 7.7
1 7.72
1 7.73
1 7.73
1 7.76
2 7.71
2 7.73
2 7.74
2 7.74
2 7.78
2 7.78
2 7.8
2 7.81
3 7.74
3 7.75
3 7.77
3 7.78
3 7.8
3 7.81
3 7.84
4 7.71
4 7.71
4 7.74
4 7.79
4 7.81
4 7.85
4 7.87
4 7.91
kruskal.test(pH~Pound)
Kruskal-Wallis rank sum test
data
Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579
multcomp(dat.limno)
[1] "Sample summary ranked by mean ranks"
Samples RSum N RMean
3 3 145.0 7 20.71429
4 4 163.5 8 20.43750
2 2 132.5 8 16.56250
1 1 55.0 8 6.87500
[1] ""
[1] "Table of multiple comparisons"
Comparison Difference SE Q
1 3 vs 1 13.8393 4.6923 2.9493
2 3 vs 2 4.1518 4.6923 0.8848
3 3 vs 4 0.2768 4.6923 0.0590
4 4 vs 1 13.5625 4.5332 2.9918
5 4 vs 2 3.8750 4.5332 0.8548
6 2 vs 1 9.6875 4.5332 2.1370
-------------------------------------------------
WebMail Bignet - O seu provedor do litoral
www.bignet.com.br
More information about the R-help
mailing list