[R] Stats help with calculating between and within subject variance and confidence intervals
Bert Gunter
gunter.berton at gene.com
Wed Sep 9 23:17:59 CEST 2009
Paul:
If these data are real -- or at least a reasonable facsimile -- then even
though the machines might be considered "random" -- i.e. a sample from a
potential population of machines -- there are too few of them to get a
reasonable estimate of their variance. Better to treat them as fixed and
just do a standard oneway anova with a single "within" = residual error
component.
Incidentally, this situation is quite common in gauge R&R = analytical
method development studies. While the problem is unavoidable -- you only
have so many machines or operators or whatever -- it is unfortunate that
standard statistical references do not point out that it's just basically
silly to try to estimate variance components with so few df, the resulting
confidence intervals, as here, being so wide as to be useless (reflecting
the inadequate information).
Incidentally, a better way to get at this when there are sufficient df is
via the lme() function in the nlme package -- it will work with unbalanced
data and not just in the balanced data situation. But there would be a
considerable learning curve required, I realize.
Cheers,
Bert Gunter
Genentech Nonclinical Biostatistics
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Paul
Sent: Wednesday, September 09, 2009 1:38 PM
To: R-help at r-project.org
Subject: [R] Stats help with calculating between and within subject variance
and confidence intervals
Hello.
I'm trying to find a way in R to calculate between and within subject
variances and confidence intervals for some analytical method
development data.
I've found a reference to a method in Burdick, R. K. & Graybill, F. A.
1992, Confidence Intervals on variance components, CRC Press. This
example is for Balanced Data confidence interval calculation from Pg
62. The data are fill weights from bottles sampled randomly from a
sample of four filling machines. There are 12 values, and the
confidence intervals are for 1-2a = 95%. I have got the same results
as the book but using slightly different fomulae (see variables for H1,
G1 and H12 and G12).
I'd appreciate any help, and any comments on whether their is a better
way to do this.
Thanks
Paul.
> BGBottles
Machine weight
1 1 14.23
2 1 14.96
3 1 14.85
4 2 16.46
5 2 16.74
6 2 15.94
7 3 14.98
8 3 14.88
9 3 14.87
10 4 15.94
11 4 16.07
12 4 14.91
> BGaov<-aov(weight~Machine,data=BGBottles)
> summary(BGaov)
Df Sum Sq Mean Sq F value Pr(>F)
Machine 3 5.3294 1.7765 9.7702 0.004733 **
Residuals 8 1.4546 0.1818
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> BGaov
Call:
aov(formula = weight ~ Machine, data = BGBottles)
Terms:
Machine Residuals
Sum of Squares 5.329425 1.454600
Deg. of Freedom 3 8
Residual standard error: 0.4264094
Estimated effects may be unbalanced
> BGaov<-aov(weight~Machine+Error(Machine),data=BGBottles)
> summary(BGaov)
Error: Machine
Df Sum Sq Mean Sq
Machine 3 5.3294 1.7765
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 8 1.45460 0.18182
> BGaov
Call:
aov(formula = weight ~ Machine + Error(Machine), data = BGBottles)
Grand Mean: 15.4025
Stratum 1: Machine
Terms:
Machine
Sum of Squares 5.329425
Deg. of Freedom 3
Estimated effects may be unbalanced
Stratum 2: Within
Terms:
Residuals
Sum of Squares 1.4546
Deg. of Freedom 8
Residual standard error: 0.4264094
> confint<-95
> a<-(1-(confint/100))/2
> #str(BGaov) # get structure of BGAOV object
> #str(summary(BGaov)) # get structure of BGaov summary
> #summary(aov.1.e)[[2]][[1]]$"Df" # Could also use this syntax
> grandmean<-as.vector(BGaov$"(Intercept)"[[1]][1]) # Grand Mean (I think)
> within<-summary(BGaov)$"Error: Within"[[1]]$"Mean Sq" # S2^2Mean
Square Value for Within Machine = 0.1819
> dfMachine<-summary(BGaov)$"Error: Machine"[[1]]$"Df" # DF for within = 3
> dfWithin<-summary(BGaov)$"Error: Within"[[1]]$"Df" # DF for within = 8
> machine<-summary(BGaov)$"Error: Machine"[[1]]$"Mean Sq" # S1^2Mean
Square for Machine
> between<-(machine-within)/((dfWithin/(dfMachine+1))+1) # (S1^2-S2^2)/J
> total<-between+within
> between # Between Run Variance
[1] 0.53155
> within # Within Run Variance
[1] 0.181825
> total # Total Variance
[1] 0.713375
> betweenCV<-sqrt(between)/grandmean * 100 # Between Machine CV%
> withinCV<-sqrt(within)/grandmean * 100 # Within Machine CV%
> totalCV<-sqrt(total)/grandmean * 100 # Total CV%
> #within confidence intervals (Chi Squared Method)
> withinLCB<-within/qf(1-a,8,Inf) # Within LCB
> withinUCB<-within/qf(a,8,Inf) # Within UCB
> #Between Confidence Intervals (Modified Large Sample Method)
> n1<-dfMachine
> n2<-dfWithin
> G1<-1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this
should be a
> G2<-1-(1/qf(1-a,n2,Inf))
> H1<-(1/qf(a,n1,Inf))-1 # and this should be 1-a, but my results
don't agree
> H2<-(1/qf(a,n2,Inf))-1
>
G12<-((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) #
again, should be a, not 1-a
> H12<-((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again,
should be 1-a, not a
> Vu<-H1^2*machine^2+G2^2*within^2+H12*machine*within
> Vl<-G1^2*machine^2+H2^2*within^2+G12*within*machine
> betweenLCB<-(machine-within-sqrt(Vl))/J # Betwen LCB
> betweenUCB<-(machine-within+sqrt(Vu))/J # Between UCB
> #Total Confidence Intervals (Graybill-Wang Method)
> y<-(machine+(J-1)*within)/J
> totalLCB<-y-(sqrt(G1^2*machine^2+G2^2*(J-1)^2*within^2)/J) # Total LCB
> totalUCB<-y+(sqrt(H1^2*machine^2+H2^2*(J-1)^2*within^2)/J) # Total UCB
> result<-data.frame(Name=c("within", "between",
"total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*10
0,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(wi
thinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandme
an*100))
> result
Name CV LCB UCB
1 within 2.768443 1.869964 5.303702
2 between 4.733483 2.123816 18.551051
3 total 5.483625 3.590745 18.772389
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list