[R] Conservative "ANOVA tables" in lmer

Sun Sep 10 14:06:47 CEST 2006

Martin Henry H. Stevens wrote:
> Hi Spencer,
> I would like to make sure I understand Spencer's question and doubt's, 
> below.
> On Sep 9, 2006, at 11:36 PM, Spencer Graves wrote:
>
>>      Peter's example and Doug's "different test" reply sent me 
>> Scheffé's discussion of the balanced and replicated mixed-effect 
>> 2-away layout.  As I note below, the obvious F test for the fixed 
>> effect does not appear to be likelihood ratio for anything.
>> Douglas Bates wrote:
>>> On 9/7/06, Douglas Bates <bates at stat.wisc.edu> wrote:
>>>
>>>> On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard 
>>>> <p.dalgaard at biostat.ku.dk> wrote:
<snip>

>>>>> I'm very wary of ANY attempt at guesswork in these matters.
>>>>>
>>>>> I may be understanding the post wrongly, but consider this case: Y_ij
>>>>> = mu + z_i + eps_ij, i = 1..3, j=1..100
>>>>>
>>>>> I get rank(X)=1, rank(X:Z)=3,  n=300
>>>>>
>>>>> It is well known that the test for mu=0 in this case is obtained by
>>>>> reducing data to group means, xbar_i, and then do a one-sample t 
>>>>> test,
>>>>> the square of which is F(1, 2), but it seems to be suggested that
>>>>> F(1, 297) is a conservative test???!
>>>>>
>>>> It's a different test, isn't it?  Your test is based upon the between
>>>> group sum of squares with 2 df.  I am proposing to use the within
>>>> group sum of squares or its generalization.
<snip>
>>      For the traditional, balanced, replicated, 2-way mixed-effects 
>> analysis, Scheffé (1959, Table 8.1.1, p. 269) gives the expected mean 
>> squares for a two-way layout with "I" levels of a fixed effect A, "J" 
>> levels of a random effect B, and "K" replicates, as follows:
>> EMS(A: fixed) = var(e) + K*var(A:B) + J*K*MeanSquareA
>> EMS(B: random) = var(e) + I*K*var(B)
>> EMS(A:B; random)=var(e)+K*var(A:B)
>> EMSE = var(e).
>>      In this case, the "obvious" test for A is MS(A: fixed) / MS(A:B, 
>> random), because this gives us a standard F statistic to test 
>> MeanSquareA = 0.  However, it doesn't make sense to me to test A 
>> without simultaneously assuming var(A:B) = 0.
>
> Does one want to test whether var(A:B)=0 because the F-test assumes 
> it? That is, that as var(A:B) increases, the var ratio 
> MS{A:fixed)/MS(A:B, random) decreases, artifactually reducing the 
> "significance" of J:K:MeanSquareA?

SG:  Scheffe recommends testing var(A:B)=0 using MS(A:B)/MSE, which 
follows an F distribution under both null and alternative hypotheses 
(but scaled differently under the alternative).  This is a monotonic 
transformation of the standard likelihood ratio test, and makes sense to 
me. 

SG:  Scheffe's recommended test for MeanSquareA = 0 is MSA/MS(A:B).  I 
haven't worked out the details, but it looks like this assumes that the 
A:B interaction may be real without an A effect, and this violates a 
basic principle of hierarchy, I think.  There are situations where an 
A:B interaction can exist without the main effect, but that rests on a 
particular parameterization, and we have not assumed that in this context.

SG:  I think this is an argument for using likelihood ratio over 
Scheffe's traditional answer.  Similar to Doug's comment about different 
tests, we should be clear about what each procedure tests, and then 
select the procedure that seems to be the closest to the problem we 
really want to solve, rather than simply relying on a test statistic 
whose distribution is better understood.  I'm not criticizing Scheffe:  
He didn't have the computer tools available in 1959 that we have today. 

>> The same argument applies to Peter's "simpler" case discussed above:  
>> With "Y_ij = mu + z_i + eps_ij", it only rarely makes sense to test 
>> mu=0 while assuming var(z) != 0.  In the balanced 2-way, 
>> mixed-effects analysis, the Neyman-Pearson thing to do, I would 
>> think, would be to test simultaneously MeanSquareA = 0 with var(A:B) 
>> = 0.  In lmer, I might write this as follows:
>>      anova(lmer(y~A+(A|B)), lmer(y~1+(1|B)).
>>      However, this does NOT match the standard analysis associated 
>> with this design, does it?  To check this, I considered problem 8.1 
>> in Scheffé (p. 289), which compares 3 different nozzles (fixed 
>> effect) tested by 5 different operators (random effect).  The data 
>> are as follows:
>> y <- c(6,6,-15,   26,12,5,     11,4,4,   21,14,7, 25,18,25,
>>       13,6,13,    4,4,11,   17,10,17,   -5,2,-5, 15,8,1,
>>     10,10,-11, -35,0,-14, 11,-10,-17, 12,-2,-16, -4,10,24)
>>
>> Nozzle <- data.frame(Nozzle=rep(LETTERS[1:3], e=15),
>>      Operator=rep(letters[1:5], e=3), flowRate=y)
>>
>>      The traditional analysis can be obtained from 
>> anova(lm(flowRate~Nozzle*Operator, ...)), but comparing MeanSq.Nozzle 
>> to MeanSq.Nozzle:Operator rather than MeanSquareResidual, as follows:
>> > fitAB0 <- lm(flowRate~Nozzle*Operator, data=Nozzle)
>> > (aov.AB0 <- anova(fitAB0))
>> Analysis of Variance Table
>>
>> Response: flowRate
>>                Df  Sum Sq Mean Sq F value   Pr(>F)  Nozzle           
>> 2 1426.98  713.49  7.0456 0.003101 **
>> Operator         4  798.80  199.70  1.9720 0.124304  Nozzle:Operator  
>> 8 1821.47  227.68  2.2484 0.051640 .
>> Residuals       30 3038.00  101.27                   ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>>      Scheffé must have computed the following:
>> > (F.A.Scheffe <- aov.AB0[1, "Mean Sq"]/aov.AB0[3, "Mean Sq"])
>> [1] 3.133690
>> > pf(F.A.Scheffe, 1, 2, lower.tail=FALSE)
>> [1] 0.2187083
>>
>>      However, I think I prefer the likelihood ratio answer to this, 
>> because I think it is better to have an approximate solution to the 
>> exact problem than an exact solution to the approximate problem.  [I 
>> got this from someone else like Tukey, but I don't have a 
>> citation.])  I can get this likelihood ratio answer from either lme 
>> or lmer.
>>      When I tried to fit this model with 'mle'; it didn't want to 
>> converge:
>> library(nlme)
>> fitAB. <- lme(flowRate~Nozzle, random=~Nozzle|Operator,
>>              data=Nozzle, method="ML")
>> Error in lme.formula(flowRate ~ Nozzle, random = ~Nozzle | Operator, 
>> data = Nozzle,  :
>>    nlminb problem, convergence error code = 1; message = iteration 
>> limit reached without convergence (9)
>>
>>      After several false starts, I got the following to work:
>> fitAB. <- lme(flowRate~Nozzle, random=~Nozzle|Operator,
>>              data=Nozzle, method="ML",
>>              control=lmeControl(opt="optim"))
>>
>> > anova(fitAB., fitB.)
>>       Model df      AIC      BIC    logLik   Test  L.Ratio p-value
>> fitAB.     1 10 361.9022 379.9688 -170.9511                       
>> fitB.      2  3 361.3637 366.7837 -177.6819 1 vs 2 13.46153  0.0616
>>
>>
>>      I got essentially the same answer from lmer (without the 
>> convergence problem, but quitting R in between:
>> > fitAB <- lmer(flowRate~Nozzle+(Nozzle|Operator),
>> +               data=Nozzle, method="ML")
>> > fitB <- lmer(flowRate~1+(1|Operator), data=Nozzle,
>> +              method="ML")
>> > anova(fitAB, fitB)
>> Data: Nozzle
>> Models:
>> fitB: flowRate ~ 1 + (1 | Operator)
>> fitAB: flowRate ~ Nozzle + (Nozzle | Operator)
>>      Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq) fitB   2  
>> 359.36  362.98 -177.68                          fitAB  9  359.88  
>> 376.14 -170.94 13.479      7    0.06126 .
>>          Comments?      Spencer Graves
>> p.s.  For the lme fit: > sessionInfo()
>> Version 2.3.1 Patched (2006-08-13 r38872)
>> i386-pc-mingw32
>>
>> attached base packages:
>> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     
>> "datasets"
>> [7] "base"
>> other attached packages:
>>    nlme
>> "3.1-75"
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>