[R] OT: a weighted rank-based, non-paired test statistic ?
Dylan Beaudette
debeaudette at ucdavis.edu
Sat Jun 6 01:44:55 CEST 2009
On Friday 05 June 2009, Thomas Lumley wrote:
> On Fri, 5 Jun 2009, Dylan Beaudette wrote:
> > Is anyone aware of a rank-based, non-paired test such as the
> > Krustal-Wallis, that can accommodate weights?
>
> You don't say what sort of weights, but basically, no.
>
> Whether you have precision weights or sampling weights, the test will no
> longer be distribution-free.
>
> > Alternatively, would it make sense to simulate a dataset by duplicating
> > observations in proportion to their weight, and then using the
> > Krustal-Wallis test?
>
> No.
>
> -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
Thanks,
Perhaps I am thinking about the problem incorrectly. Here are some more
details.
I have a series of values, each associated with a polygon that represents an
area on the ground. A sample is made by collecting the polygons (i.e. the
values associated with each polygon) and the area fraction of each polygon
(i.e. the observation's "weight"). In this past I have used this method to
compute a weighted average, or weighted standard deviation.
However, I now have a case where a subset of individuals from the region are
selected, and re-assigned an area fraction. I would like to evaluate if the
subsets are "representative" of the original data-- so far I have been
comparing weighted means.
Unfortunately, the data are prone to having highly skewed distributions--
possibly making my comparisons of weighted-means invalid.
I posted a question about this a while back, but just in case here are some
data that approximately resemble the problem at hand:
# generate some data
# note lack of balance
x.1 <- rnorm(n=100, mean=0, sd=1)
x.2 <- rnorm(n=10, mean=4, sd=3)
x.3 <- rnorm(n=10, mean=0, sd=1.5)
# generate some weights that sum to 1 for each treatment
# c/o this post:
# https://stat.ethz.ch/pipermail/r-help/2008-July/167044.html
#
w.1 <- diff(c(0,sort(sample(seq(1,99,1),99,replace=T)),100)) / 100
w.2 <- diff(c(0,sort(sample(seq(1,99,1),9,replace=T)),100)) / 100
w.3 <- diff(c(0,sort(sample(seq(1,99,1),9,replace=T)),100)) / 100
# generate treatment labels:
trt <- factor(c(rep('baseline', 100), rep(c('data1','data2'), each=10)))
# combine into dataframe
d <- data.frame(values=c(x.1,x.2,x.3), wts=c(w.1,w.2,w.3), treatment=trt)
# compute means and wt.means
d.means <- sapply(by(d, d$treatment, function(i) mean(i$value)), '[')
d.wt.means <- sapply(by(d, d$treatment, function(i) weighted.mean(i$value,
w=i$wts)), '[')
# quick check:
boxplot(values~ treatment, data=d, varwidth=TRUE, border=grey(0.5),
main='symbol size is proportional to observation weight')
points(1:3, d.means, col='green', pch=15, cex=1.2)
points(1:3, d.wt.means, col='blue', pch=15, cex=1.2)
points(jitter(as.numeric(d$treatment)), d$values, col='red',
cex=sqrt(d$wts*100))
legend('topleft', legend=c('original data','mean','wt.mean'), pch=c(1,15,15),
col=c('red','green','blue'))
# another way of looking at the data:
library(lattice)
densityplot(~ values, groups=treatment, data=d, auto.key=list(columns=3),
pch='|')
# comparison of means (without weights):
# effects are equal to treatment means
summary(lm(values ~ treatment, data=d))
# comparison with means
# effects are equal to treatment weighted-means
summary(lm(values ~ treatment, data=d, weights=wts))
Cheers,
Dylan
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
More information about the R-help
mailing list