[R] a "simple" statistic question

Mon Mar 22 18:55:31 CET 2010

Dear Xiang,

Now I understand what you meant.  If you are only interested in
comparing the Good Samples, I think you would have to use the
proportion (Good Sample/Total Sample) or something similar.  Another
thought would be to dummy code the data (e.g., Good = +1, Fair = 0,
Bad = -1).  Then you could compare the means.  Obviously in my example
the mean would be relatively less impacted by Fair than Bad samples.
Another benefit of this approach (over comparing mean number of good
samples from each city) is that you estimate variability within
factories (based of the dummy codes) which controls for many variables
relative to within city.

Once you dummy code, the test itself is not difficult.  Below is a
sample function.  You can either include the mean, standard deviation,
and n of each group OR include the raw data with each factory being a
separate object in a list that you use as "x".  The lambdas are the
weights to apply to the means.  If you want to compare City A to City
B, you could use -1 for every factory in A and +1 for every factory in
B (same idea as a two sample t-test, you just estimate the variability
more finely because it is by factory and then pooled).  A cautionary
note, I just wrote this function myself.  I tested that it gives the
same result as the t.test() function on two samples (mine uses
one-tailed p-values) for simple numeric vectors; however, I have no
idea what it would do with other types of data and it may even appear
to have worked but returned wrong results.

######################################
t.contrast.test <- function(x=NA, lambda, m=NULL, s=NULL, n=NULL,
raw=TRUE, na.rm=TRUE) {
ifelse(identical(raw, TRUE), {
for(i in seq_along(x)) {m[i] <- mean(x[[i]], na.rm=na.rm)};
for(i in seq_along(x)) {s[i] <- sd(x[[i]], na.rm=na.rm)};
for(i in seq_along(x)) {n[i] <- length(x[[i]])};
NA}, {
NA})
df <- sum(n-1)
effect <- sum(m*lambda)
s2.pooled <- weighted.mean(x=s^2, w=n-1)
sample.correction <- sum((lambda^2)/n)
variability <- sqrt(sample.correction*s2.pooled)
t.score <- effect/variability
p.score <- pt(q=t.score, df=df, lower.tail=F)
r.score <- t.score/sqrt((t.score^2)+df)
value <- list(t.score, p.score, r.score, s2.pooled, df)
names(value) <- c("t.contrast", "p.value", "r.contrast",
"pooled.variance", "df")
return(value)}
######################################

I hope that all made sense.

Best Regards,

Joshua

-- 
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/