[R-sig-ME] Desperately seeking help with simple lmer fit (lmer vs Proc Mixed)
Douglas Bates
bates at stat.wisc.edu
Wed Feb 18 21:20:58 CET 2009
On Wed, Feb 18, 2009 at 1:56 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> Thanks for the example.
>
> The estimates of the variance components are the result of an
> optimization and it appears that there are two local optima for this
> problem. I'm sorry to say that the optimum determined by SAS PROC
> MIXED represents a better fit (a lower deviance) than the one
> determined by lmer.
>
> If you plot the response versus run, say
>
> dotplot(reorder(Run, Y) ~ Y)
>
> you will see that there is one run (number 13) with unusually large Y,
> which may have abnormal influence. I was able to reproduce the SAS
> results by modifying the starting estimates. This is not optimal
> because it means you need to know the correct answer before you can
> calculate it.
>
>> (fm1 <- lmer(Y ~ (1|Run) + (1|Prep), pr, verbose = TRUE))
> 0: 1536.6629: 0.666667 0.384900
> 1: 1441.4487: 1.44044 1.01837
> 2: 1421.4662: 2.35838 0.621655
> 3: 1408.1762: 3.27848 0.229969
> 4: 1401.9874: 4.04888 0.00000
> 5: 1399.8678: 4.59188 0.00000
> 6: 1398.9477: 5.12145 0.00000
> 7: 1398.7722: 5.41633 0.00000
> 8: 1398.7543: 5.53332 0.00000
> 9: 1398.7539: 5.55421 0.00000
> 10: 1398.7539: 5.55527 1.68436e-07
> 11: 1398.7539: 5.55527 1.68436e-07
> Linear mixed model fit by REML
> Formula: Y ~ (1 | Run) + (1 | Prep)
> Data: pr
> AIC BIC logLik deviance REMLdev
> 1407 1417 -699.4 1410 1399
> Random effects:
> Groups Name Variance Std.Dev.
> Run (Intercept) 3.5859e+05 5.9882e+02
> Prep (Intercept) 3.2965e-10 1.8156e-05
> Residual 1.1619e+04 1.0779e+02
> Number of obs: 108, groups: Run, 18; Prep, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 28216.5 141.5 199.4
>> (fm1a <- lmer(Y ~ (1|Run) + (1|Prep), pr, verbose = TRUE, start = c(4,4)))
> 0: 1393.6144: 4.00000 4.00000
> 1: 1393.5078: 3.74366 4.09884
> 2: 1393.4819: 3.72800 4.32664
> 3: 1393.4810: 3.73535 4.37362
> 4: 1393.4810: 3.73373 4.37885
> 5: 1393.4810: 3.73404 4.37877
> 6: 1393.4810: 3.73404 4.37875
> 7: 1393.4810: 3.73404 4.37875
> Linear mixed model fit by REML
> Formula: Y ~ (1 | Run) + (1 | Prep)
> Data: pr
> AIC BIC logLik deviance REMLdev
> 1401 1412 -696.7 1406 1393
> Random effects:
> Groups Name Variance Std.Dev.
> Run (Intercept) 162010 402.50
> Prep (Intercept) 222784 472.00
> Residual 11619 107.79
> Number of obs: 108, groups: Run, 18; Prep, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 28216 215 131.2
>
> If you remove Run 13 you do get results like those from SAS using lmer
>
>> (fm2 <- lmer(Y ~ (1|Run) + (1|Prep), pr, verbose = TRUE, subset = Run != 13))
> 0: 1426.8110: 0.666667 0.396059
> 1: 1342.0591: 1.47666 0.982502
> 2: 1325.5670: 2.35364 0.501989
> 3: 1311.1286: 3.60322 0.00000
> 4: 1309.3047: 3.96698 0.00000
> 5: 1307.8040: 4.56388 1.65053e-05
> 6: 1307.5200: 4.88366 6.39529e-05
> 7: 1307.4763: 5.04660 0.000170199
> 8: 1307.4746: 5.08457 0.000341660
> 9: 1307.4745: 5.08786 0.000627531
> 10: 1307.4745: 5.08806 0.00113168
> 11: 1307.4743: 5.09250 0.0198356
> 12: 1306.7915: 5.55085 2.00076
> 13: 1306.7202: 5.57778 2.29410
> 14: 1304.5962: 3.26016 2.72067
> 15: 1304.1219: 3.90630 4.98691
> 16: 1303.9334: 3.70822 4.85924
> 17: 1303.7839: 3.39994 4.44476
> 18: 1303.7039: 3.78966 4.10571
> 19: 1303.6539: 3.50693 3.67340
> 20: 1303.6333: 3.63992 3.74762
> 21: 1303.6285: 3.55042 3.87085
> 22: 1303.6259: 3.59760 3.87143
> 23: 1303.6258: 3.59381 3.86414
> 24: 1303.6258: 3.59257 3.86456
> 25: 1303.6258: 3.59308 3.86577
> 26: 1303.6258: 3.59318 3.86504
> 27: 1303.6258: 3.59312 3.86514
> Linear mixed model fit by REML
> Formula: Y ~ (1 | Run) + (1 | Prep)
> Data: pr
> Subset: Run != 13
> AIC BIC logLik deviance REMLdev
> 1312 1322 -651.8 1316 1304
> Random effects:
> Groups Name Variance Std.Dev.
> Run (Intercept) 135857 368.59
> Prep (Intercept) 157206 396.49
> Residual 10523 102.58
> Number of obs: 102, groups: Run, 17; Prep, 6
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 28161.8 185.5 151.8
>
>
> This example may be motivation for me to try out a modification in the
> optimization method that I have been contemplating.
Well, the good news is that the modification was successful on this example
> (fm1 <- lmer(Y ~ 1 + (1|Run) + (1|Prep), pr, verbose = TRUE))
0: 1536.6629: 0.816497 0.620403
1: 1405.8868: 1.66562 1.14860
2: 1399.0326: 1.71590 1.44355
3: 1394.7782: 1.84561 1.71319
4: 1393.9976: 1.79780 2.00855
5: 1393.9690: 2.09290 2.05797
6: 1393.5053: 1.96491 2.07162
7: 1393.4829: 1.92326 2.08972
8: 1393.4810: 1.93265 2.09030
9: 1393.4810: 1.93240 2.09189
10: 1393.4810: 1.93236 2.09254
11: 1393.4810: 1.93237 2.09255
Linear mixed model fit by REML
Formula: Y ~ 1 + (1 | Run) + (1 | Prep)
Data: pr
AIC BIC logLik deviance REMLdev
1401 1412 -696.7 1406 1393
Random effects:
Groups Name Variance Std.Dev.
Run (Intercept) 162010 402.50
Prep (Intercept) 222784 472.00
Residual 11619 107.79
Number of obs: 108, groups: Run, 18; Prep, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 28216 215 131.2
The bad news is that I made the modification in a development version
of the lme4 package and that version won't be ready for prime time for
a while. I still have a lot of work to do on the generalized linear
mixed models.
> On Wed, Feb 18, 2009 at 12:41 PM, David LeBlond
> <David.LeBlond at abbott.com> wrote:
>>
>> Dear Dr. Bates,
>>
>> First, apologies for this request .... however I am desperate. I have spent
>> 3 days on this. I must have a very fundamental misunderstanding of lmer or
>> factors or grouped data or syntax or something else.
>>
>> To show due diligence, I have read all I can find on lmer syntax, gone
>> through the SASmixed Package including the "lmer for SAS PROC MIXED Users",
>> other presentations from yourself and others on lmer. I have visited the
>> blogs. In the last 24h on this problem I have tried every permutation I can
>> think of.
>>
>> The experiment is a simple balanced 3 level nested variance component study
>> virtually identical to Example 4.4, page 156 of Little et al, SAS System for
>> Mixed Models. There are 3 variables: There are 6 Y results nested within
>> each of 18 runs. There are 3 runs nested within each of 6 preps. Site is
>> ignored.
>>
>> I am using the lmer code analagous to what you used to analyze Example 4.4
>> in lmer:
>>
>> lmer syntax: lmer(Y~1+(1|Prep/Run))
>> MIXED syntax: proc mixed;class run prep site; model
>> y=/s;random prep run(prep);run;
>>
>> The error variance components and fixed effect estimates agree, but the
>> variance components for Prep and Run are completely different.
>>
>> Source lmer SAS
>> Run 358580 162010
>> Prep 0.012776 222784
>>
>> For a nearly identical problem, (Semi2 in SASmixed) lmer and MIXED give
>> identical results. As I can repeat the SAS result in JMP, I trust it. I
>> believe I am making some error in my use of lmer. I need to complete the
>> analysis in lmer, not SAS or JMP.
>>
>> Can you or someone please explain to me why the variance component estimates
>> are so radically different between MIXED and lmer? I know I am making some
>> dumb mistake and I am prepared to be humbled.
>>
>> Thank you so much!!
>>
>> Dave
>> ***************************************************************
>> PS - Below I am sending my lmer data (a dif file), code and output along
>> with the SAS data (in code), code and output.
>>
>>
>> Here is the Excel data needed for R:
>>
>>
>> Below is the R code which also contains the outputs and SAS data and code
>> (sorry for length - 108 lines of data).
>>
>> library(lattice)
>> library(foreign)
>> library(lme4)
>> Suitability<-read.DIF("C:/Documents and Settings/leblodj/Desktop/Synthroid
>> Dissolution Method Change Lynn Davis Feb09/Suitability.dif",
>> header=TRUE,transpose=TRUE,nrows=369,colClasses="character")
>>
>> str(Suitability)
>> Y<-as.numeric(Suitability$Y)
>> Site<-ordered(Suitability$Site)
>> Prep<-ordered(Suitability$Prep)
>> Run<-ordered(Suitability$Run)
>> levels(Run)
>> levels(Prep)
>> lmer(Y~1+(1|Prep/Run))
>>
>> ####### lmer output #################
>> #Linear mixed model fit by REML
>> #Formula: Y ~ 1 + (1 | Prep/Run)
>> # AIC BIC logLik deviance REMLdev
>> # 1407 1417 -699.4 1410 1399
>> #Random effects:
>> # Groups Name Variance Std.Dev.
>> # Run:Prep (Intercept) 3.5858e+05 598.81878
>> # Prep (Intercept) 1.2776e-02 0.11303
>> # Residual 1.1619e+04 107.79332
>> #Number of obs: 108, groups: Run:Prep, 18; Prep, 6
>> #
>> #Fixed effects:
>> # Estimate Std. Error t value
>> #(Intercept) 28216.5 141.5 199.4
>> #######################################
>>
>>
>>
>> #############################################
>>
>>
>> ##### SAS Proc Mixed Output (matches JMP) ##
>> ##### See full SAS code Below ##############
>> # Covariance Parameter
>> # Estimates
>> #
>> #Cov Parm Estimate
>> #
>> #Prep 222784
>> #Run(Prep) 162010
>> #Residual 11619
>> Solution for Fixed Effects
>> Standard
>> Effect Estimate Error DF t Value Pr > |t|
>> Intercept 28216 215.03 5 131.22 <.0001
>> ############################################
>> ###### SAS Proc Mixed Code ##########
>> data Suitability;
>> input Site$ Prep$ Run$ Y;
>> cards;
>> 2 1 1 28325
>> 2 1 1 28314
>> 2 1 1 28277
>> 2 1 1 28183
>> 2 1 1 28423
>> 2 1 1 28299
>> 2 1 2 27578
>> 2 1 2 27724
>> 2 1 2 27541
>> 2 1 2 27686
>> 2 1 2 27679
>> 2 1 2 27567
>> 2 1 3 27207
>> 2 1 3 27351
>> 2 1 3 27416
>> 2 1 3 27292
>> 2 1 3 27168
>> 2 1 3 27117
>> 2 2 4 28036
>> 2 2 4 28042
>> 2 2 4 27902
>> 2 2 4 28110
>> 2 2 4 27794
>> 2 2 4 28181
>> 2 2 5 28018
>> 2 2 5 28318
>> 2 2 5 28170
>> 2 2 5 28410
>> 2 2 5 28444
>> 2 2 5 28620
>> 2 2 6 27764
>> 2 2 6 28008
>> 2 2 6 27849
>> 2 2 6 28096
>> 2 2 6 27788
>> 2 2 6 27884
>> 2 3 7 28201
>> 2 3 7 28224
>> 2 3 7 28206
>> 2 3 7 28462
>> 2 3 7 28290
>> 2 3 7 28303
>> 2 3 8 28244
>> 2 3 8 28125
>> 2 3 8 28078
>> 2 3 8 28023
>> 2 3 8 28037
>> 2 3 8 28014
>> 2 3 9 28738
>> 2 3 9 28726
>> 2 3 9 28765
>> 2 3 9 28534
>> 2 3 9 28518
>> 2 3 9 28489
>> 1 4 10 28717
>> 1 4 10 28622
>> 1 4 10 28790
>> 1 4 10 28697
>> 1 4 10 28531
>> 1 4 10 28502
>> 1 4 11 28478
>> 1 4 11 28425
>> 1 4 11 28609
>> 1 4 11 28541
>> 1 4 11 28518
>> 1 4 11 28492
>> 1 4 12 28808
>> 1 4 12 28764
>> 1 4 12 28745
>> 1 4 12 28786
>> 1 4 12 28773
>> 1 4 12 28622
>> 1 5 13 29405
>> 1 5 13 29372
>> 1 5 13 29435
>> 1 5 13 29831
>> 1 5 13 29550
>> 1 5 13 29402
>> 1 5 14 28708
>> 1 5 14 28684
>> 1 5 14 28772
>> 1 5 14 28705
>> 1 5 14 28749
>> 1 5 14 28754
>> 1 5 15 28559
>> 1 5 15 28584
>> 1 5 15 28636
>> 1 5 15 28713
>> 1 5 15 28652
>> 1 5 15 28530
>> 1 6 16 26991
>> 1 6 16 27009
>> 1 6 16 26949
>> 1 6 16 26957
>> 1 6 16 26965
>> 1 6 16 26934
>> 1 6 17 28174
>> 1 6 17 28074
>> 1 6 17 28113
>> 1 6 17 28154
>> 1 6 17 28200
>> 1 6 17 27952
>> 1 6 18 27628
>> 1 6 18 27569
>> 1 6 18 27624
>> 1 6 18 27737
>> 1 6 18 27675
>> 1 6 18 27654
>> ;
>>
>> proc mixed;
>> class run prep site;
>> model y=/s;
>> random prep run(prep);run;
>>
>>
>> ________________________________
>> David LeBlond
>> Senior Research Statistician
>> MS Statistics, MS/PhD Biochemistry
>> Exploratory Statistics Global Pharmaceutical R&D
>> 100 Abbott Park Road
>> Abbott Park, IL 60064-3500
>> USA R13-1 847-935-6031
>> AP9A-1 847-935-1899
>> David.LeBlond at abbott.com
>>
>> ________________________________
>> This communication may contain information that is proprietary,
>> confidential, or exempt from disclosure. If you are not the intended
>> recipient, please note that any other dissemination, distribution, use or
>> copying of this communication is strictly prohibited. Anyone who receives
>> this message in error should notify the sender immediately by telephone or
>> by return e-mail and delete it from his or her computer.
>>
>
More information about the R-sig-mixed-models
mailing list