[R] Stats help with calculating between and within subject variance and confidence intervals

Wed Sep 9 23:17:59 CEST 2009

Paul:

If these data are real -- or at least a reasonable facsimile -- then even
though the machines might be considered "random" -- i.e. a sample from a
potential population of machines -- there are too few of them to get a
reasonable estimate of their variance. Better to treat them as fixed and
just do a standard oneway anova with a single "within" = residual error
component.

Incidentally, this situation is quite common in gauge R&R = analytical
method development studies. While the problem is unavoidable -- you only
have so many machines or operators or whatever -- it is unfortunate that
standard statistical references do not point out that it's just basically
silly to try to estimate variance components with so few df, the resulting
confidence intervals, as here, being so wide as to be useless (reflecting
the inadequate information).

Incidentally, a better way to get at this when there are sufficient df is
via the lme() function in the nlme package -- it will work with unbalanced
data and not just in the balanced data situation. But there would be a
considerable learning curve required, I realize.

Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Paul
Sent: Wednesday, September 09, 2009 1:38 PM
To: R-help at r-project.org
Subject: [R] Stats help with calculating between and within subject variance
and confidence intervals

Hello.
I'm trying to find a way in R to calculate between and within subject 
variances and confidence intervals for some analytical method 
development data.

I've found a reference to a method in Burdick, R. K. & Graybill, F. A. 
1992, Confidence Intervals on variance components, CRC Press.  This 
example is for Balanced Data confidence interval calculation from Pg 
62.  The data are fill weights from bottles sampled randomly from a 
sample of four filling machines.  There are 12 values, and the 
confidence intervals are for 1-2a = 95%.  I  have got the same results 
as the book but using slightly different fomulae (see variables for H1, 
G1 and H12 and G12).

I'd appreciate any help, and any comments on whether their is a better 
way to do this.

Thanks

Paul.

 > BGBottles
   Machine weight
1        1  14.23
2        1  14.96
3        1  14.85
4        2  16.46
5        2  16.74
6        2  15.94
7        3  14.98
8        3  14.88
9        3  14.87
10       4  15.94
11       4  16.07
12       4  14.91

 > BGaov<-aov(weight~Machine,data=BGBottles)

 > summary(BGaov)
            Df Sum Sq Mean Sq F value   Pr(>F)   
Machine      3 5.3294  1.7765  9.7702 0.004733 **
Residuals    8 1.4546  0.1818                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 > BGaov
Call:
   aov(formula = weight ~ Machine, data = BGBottles)

Terms:
                 Machine Residuals
Sum of Squares  5.329425  1.454600
Deg. of Freedom        3         8

Residual standard error: 0.4264094
Estimated effects may be unbalanced

 > BGaov<-aov(weight~Machine+Error(Machine),data=BGBottles)

 > summary(BGaov)

Error: Machine
        Df Sum Sq Mean Sq
Machine  3 5.3294  1.7765

Error: Within
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  8 1.45460 0.18182               

 > BGaov

Call:
aov(formula = weight ~ Machine + Error(Machine), data = BGBottles)

Grand Mean: 15.4025

Stratum 1: Machine

Terms:
                 Machine
Sum of Squares  5.329425
Deg. of Freedom        3

Estimated effects may be unbalanced

Stratum 2: Within

Terms:
                Residuals
Sum of Squares     1.4546
Deg. of Freedom         8

Residual standard error: 0.4264094

 > confint<-95

 > a<-(1-(confint/100))/2

 > #str(BGaov) # get structure of BGAOV object

 > #str(summary(BGaov)) # get structure of BGaov summary

 > #summary(aov.1.e)[[2]][[1]]$"Df" # Could also use this syntax

 > grandmean<-as.vector(BGaov$"(Intercept)"[[1]][1]) # Grand Mean (I think)

 > within<-summary(BGaov)$"Error: Within"[[1]]$"Mean Sq"  # S2^2Mean 
Square Value for Within Machine = 0.1819

 > dfMachine<-summary(BGaov)$"Error: Machine"[[1]]$"Df"  # DF for within = 3

 > dfWithin<-summary(BGaov)$"Error: Within"[[1]]$"Df"  # DF for within = 8

 > machine<-summary(BGaov)$"Error: Machine"[[1]]$"Mean Sq" # S1^2Mean 
Square for Machine

 > between<-(machine-within)/((dfWithin/(dfMachine+1))+1) # (S1^2-S2^2)/J

 > total<-between+within

 > between # Between Run Variance
[1] 0.53155

 > within # Within Run Variance
[1] 0.181825

 > total # Total Variance
[1] 0.713375

 > betweenCV<-sqrt(between)/grandmean * 100 # Between Machine CV%

 > withinCV<-sqrt(within)/grandmean * 100 # Within Machine CV%

 > totalCV<-sqrt(total)/grandmean * 100 # Total CV%

 > #within confidence intervals (Chi Squared Method)

 > withinLCB<-within/qf(1-a,8,Inf) # Within LCB

 > withinUCB<-within/qf(a,8,Inf) # Within UCB

 > #Between Confidence Intervals (Modified Large Sample Method)

 > n1<-dfMachine

 > n2<-dfWithin

 > G1<-1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this 
should be a

 > G2<-1-(1/qf(1-a,n2,Inf))

 > H1<-(1/qf(a,n1,Inf))-1  # and this should be 1-a, but my results 
don't agree

 > H2<-(1/qf(a,n2,Inf))-1

 > 
G12<-((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # 
again, should be a, not 1-a

 > H12<-((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, 
should be 1-a, not a

 > Vu<-H1^2*machine^2+G2^2*within^2+H12*machine*within

 > Vl<-G1^2*machine^2+H2^2*within^2+G12*within*machine

 > betweenLCB<-(machine-within-sqrt(Vl))/J # Betwen LCB

 > betweenUCB<-(machine-within+sqrt(Vu))/J # Between UCB

 > #Total Confidence Intervals (Graybill-Wang Method)

 > y<-(machine+(J-1)*within)/J

 > totalLCB<-y-(sqrt(G1^2*machine^2+G2^2*(J-1)^2*within^2)/J) # Total LCB

 > totalUCB<-y+(sqrt(H1^2*machine^2+H2^2*(J-1)^2*within^2)/J) # Total UCB

 > result<-data.frame(Name=c("within", "between", 
"total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*10
0,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(wi
thinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandme
an*100))

 > result
     Name       CV      LCB       UCB
1  within 2.768443 1.869964  5.303702
2 between 4.733483 2.123816 18.551051
3   total 5.483625 3.590745 18.772389

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.