[R-sig-ME] Mixed Models in SAS and R

Ben Bolker bbolker at gmail.com
Thu Feb 15 23:21:29 CET 2018


   Everything here is true, or at least reasonable, but not related to
the original question.

 1. I believe the original question has identified (1) a potentially
interesting question about differences between REML and ML estimation
for a poorly posed model [alternatively one could say "it's poorly
posed, I don't care how it works"] and (2) a potential
problem/bug/infelicity where lmer fails to warn of the ill-posedness _on
some platforms_. (The conversation is continuing on CrossValidated:
https://stats.stackexchange.com/questions/328712/why-does-lmer-unlike-sas-output-a-non-zero-variance-of-random-effect-if-it-is/328845
)

2. If you can figure out how to make lmer work with negative variances
without completely breaking the linear algebra, I'd be interested to
hear about it. (I don't think I can say for sure that I'd accept a pull
request -- Doug and Martin might have perfectly reasonable philosophical
reservations.)  I suspect it will be very hard or impossible, although
I'm happy to be proved wrong.

3. IMO the correct way to handle the case you've suggested is to allow
for compound symmetric variance structures.  I believe glmmTMB can do
this, and the flexLambda branch of lme4 can do this, and it can
certainly be hacked by someone who wants to do it. I would dearly like
to find the time and energy to make the production version of lme4
capable of this, but holding your breath is not recommended.

On 18-02-15 04:07 PM, John Maindonald wrote:
> The lme4 code, in a practice designed to preserve its vision of mathematical purity,
> constrains the variance estimates to be non-negative.  That makes sense if one
> can be sure that the model is tuned to the circumstances that generated the data.
> 
> The negative variances that some of the other mixed modeling software allows can
> be a useful diagnostic, indicating for example an inappropriate experimental design.
> I have mentioned previously the example once mentioned to me, and which would
> be easy to simulate, where blocks in a field block experimental design had been
> laid out at right angles to a river bank.  A negative block variance estimate was
> required to give a variance-covariance matrix which had all the mathematical
> respectability that anyone might want.
> 
> There are circumstances where it is useful to constrain the variance estimates to
> be non-negative — it would be useful to have a choice.
> 
> 
> John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>
> 
> On 16/02/2018, at 07:25, Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov<mailto:inh4 at cdc.gov>> wrote:
> 
> And this might be a very dumb question, but I thought that the Hessian was related to the variance estimates. SO if the Hessian has a negative eigenvalue, wouldn’t one of the variance estimates be negative?.....I thought that is what happens in SAS, and instead of reporting a negative variance estimate it bounds the estimate at 0.
> 
> But it has been a while since I have gotten deep into the theory behind these models, so I may be completely wrong.....and if so, we can just leave it at that.
> 
> -----Original Message-----
> From: Baldwin, Jim -FS [mailto:jbaldwin at fs.fed.us]
> Sent: Thursday, February 15, 2018 11:50 AM
> To: Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov<mailto:inh4 at cdc.gov>>; Peter Claussen <dakotajudo at mac.com<mailto:dakotajudo at mac.com>>
> Cc: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
> Subject: RE: [R-sig-ME] Mixed Models in SAS and R
> 
> Shows up for me at the very end of the summary.  What version of R and lme4 are you using?
> 
> summary(l)
> Linear mixed model fit by REML ['lmerMod']
> Formula: y ~ (1 | site) + site
>   Data: fake_data
> 
> REML criterion at convergence: 49.7
> 
> Scaled residuals:
>    Min      1Q  Median      3Q     Max
> -1.8662 -0.5452 -0.1016  0.7029  1.8630
> 
> Random effects:
> Groups   Name        Variance Std.Dev.
> site     (Intercept) 4.5429   2.1314
> Residual             0.7182   0.8475
> Number of obs: 20, groups:  site, 2
> 
> Fixed effects:
>            Estimate Std. Error t value
> (Intercept)    6.171      2.148   2.872
> site2         -2.808      3.038  -0.924
> 
> Correlation of Fixed Effects:
>      (Intr)
> site2 -0.707
> convergence code: 0
> unable to evaluate scaled gradient
> Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
> 
> 
> 
> 
> Jim Baldwin, PhD
> Station Statistician
> Forest Service
> Pacific Southwest Research Station
> 
> 
> 
> -----Original Message-----
> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Bertke, Stephen (CDC/NIOSH/DSHEFS)
> Sent: Thursday, February 15, 2018 8:46 AM
> To: Peter Claussen <dakotajudo at mac.com<mailto:dakotajudo at mac.com>>
> Cc: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
> Subject: Re: [R-sig-ME] Mixed Models in SAS and R
> 
> I do not see any errors in R. What I posted is the entire output from running summary(). Is there something else I should be asking for?
> 
> From: Peter Claussen [mailto:dakotajudo at mac.com]
> Sent: Thursday, February 15, 2018 11:15 AM
> To: Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov<mailto:inh4 at cdc.gov>>
> Cc: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
> Subject: Re: [R-sig-ME] Mixed Models in SAS and R
> 
> I posted a response on stackexchange; I don’t think this is an R specific issue - SAS reports the same problem for your test data.
> 
> R is reporting a "degenerate Hessian with 1 negative eigenvalue", SAS reports that "final Hessian is not positive definite” - by definition, a positive definite matrix is a symmetric matrix with all positive eigenvalues.
> 
> Cheers,
> 
> 
> 
> On Feb 15, 2018, at 9:56 AM, Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov<mailto:inh4 at cdc.gov><mailto:inh4 at cdc.gov>> wrote:
> 
> Sorry for all the emails, but I have edited/simplified my question to what I believe is the root issue as well as posted simulated data to test with:
> 
> https://stats.stackexchange.com/questions/328712/lmer-vs-proc-mixed-output
> 
> 
> 
> -----Original Message-----
> From: Bertke, Stephen (CDC/NIOSH/DSHEFS)
> Sent: Wednesday, February 14, 2018 2:25 PM
> To: 'Thierry Onkelinx' <thierry.onkelinx at inbo.be<mailto:thierry.onkelinx at inbo.be>>
> Cc: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
> Subject: RE: [R-sig-ME] Mixed Models in SAS and R
> 
> I posted the question on stackoverflow here:
> https://stackoverflow.com/questions/48794651/lmer-vs-proc-mixed-output
> 
> This will hopefully make reading my code and output easier.
> 
> I would expect a 0 variance since I am in essence fitting a model with both the site variable in the fixed and random part of the model. I actually went ahead and fit that "dumb" model in both SAS and R and once again, all results are nearly identical except SAS estimates a 0 variance and R estimates a relatively large positive variance. However, R now gives an error/warning at the bottom of the results indicating an issue with this very dumb model. Again, those results are in the above link.
> 
> 
> 
> 
> -----Original Message-----
> From: Thierry Onkelinx [mailto:thierry.onkelinx at inbo.be]
> Sent: Wednesday, February 14, 2018 11:39 AM
> To: Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov<mailto:inh4 at cdc.gov>>
> Cc: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
> Subject: Re: [R-sig-ME] Mixed Models in SAS and R
> 
> Dear Stephen,
> 
> The list removes all HTML formating, making your post hard to read.
> Please use only plain text when posting.
> 
> You'll need to make sure that you fit exactly the same model in SAS as in R. Not everyone here speaks SAS. Providing the math equation for the SAS model would help.
> 
> Also please elaborate why it makes sense that the variance of site should be zero. We cannot verify that statement based on the information you provide.
> 
> Best regards,
> 
> ir. Thierry Onkelinx
> Statisticus / Statistician
> 
> Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be<mailto:thierry.onkelinx at inbo.be> Havenlaan 88 bus 73, 1000 Brussel www.inbo.be<http://www.inbo.be>
> 
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ///////////////////////////////////////////////////////////////////////////////////////////
> 
> 
> 
> 
> 2018-02-14 17:07 GMT+01:00 Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov<mailto:inh4 at cdc.gov>>:
> 
> Hello everyone. I have just joined this mailing list so I wanted to introduce myself as well as ask a question.
> 
> My name is Steve Bertke and I am a researcher at the National Institute for Occupational Safety and Health (NIOSH) which is one of the centers within the Centers for Disease Control and Prevention (CDC). I am a long-time SAS user but have been slowly finding myself using and liking R. I have found the support community for R very helpful and rewarding and am looking forward to contributing my part.
> 
> On to my question.
> 
> I have ran the same model (I think) in both SAS (proc mixed) and R (lmer) but have gotten different results for the random terms.
> 
> A quick background, we took 2 personal air samples for 127 people at 15 different factories for a total of 252 samples (after dropping 2 samples). We are trying to model various factors of the factory on the air samples. To do so, we need to control for the repeated measure of the person (nested within factory) as well as the random factory effect.
> 
> You can see below for the exact code and exact results, but in short, when I run the models in R and SAS without any fixed effects, I get the same results for the random effects. When I enter in the fixed effects, I get a different result for the Site variance...but all other results are the same (both the fixed effects and the other random effects).
> 
> There is another methodological issue with the model I am running. I am entering in 17 fixed effects that describe the 15 sites. Therefore, the model is over-specified. As a result, SAS gives a variance estimate of 0 for site (which, in hindsight, makes sense) however, R does not. That is a separate issue that we are dealing with, but I would expect that R and SAS would give the same result. Or maybe at least a warning.
> 
> Below is my code and output. I hope the formatting remains so that it is easily readable for everyone. I may also be able to share the data too, but I need to get some approval for that first.
> 
> Details
> I ran the following code in SAS and R without any fixed effects and both give the same results:
> 
> proc mixed data=dat;
> class NewSiteID NIOSHID;
> model ln_i = ;
> random NewSiteID NIOSHID(NewSiteID);
> run;
> 
> Covariance Parameter Estimates
> 
> Cov Parm
> 
> Estimate
> 
> NewSiteID
> 
> 6.3433
> 
> NIOSHID(NewSiteID)
> 
> 0.7465
> 
> Residual
> 
> 2.5256
> 
> 
> mixedidsite <- lmer(ln_i ~ (1 | NewSiteID/NIOSHID),
>                   data = Modeling_Database_Final)
> summary(mixedidsite)
> 
> Random effects:
> Groups                              Name        Variance Std.Dev.
> NIOSHID:NewSiteID       (Intercept) 0.7465   0.864
> NewSiteID                        (Intercept) 6.3434   2.519
> Residual                                                 2.5256   1.589
> 
> However, when I add in the fixed effects, I get different results. SAS gives an estimate of 0 for Site while R does not. All other results are the same:
> 
> proc mixed data=dat;
> class NewSiteID NIOSHID F_mass_handled_or (ref=first); model ln_i = F_High_Exp F_Mat_Type_SW F_Mat_Type_CNF F_Mat_Form_Dry F_Mat_Form_Liq F_Mat_Form_Comp F_Hybrid F_Primary F_Coatings F_mass_handled_or F_adequate F_inadequate F_emp_sc F_diam_sc/solution; random NewSiteID NIOSHID(NewSiteID); run;
> 
> Covariance Parameter Estimates
> 
> Cov Parm
> 
> Estimate
> 
> NewSiteID
> 
> 0
> 
> NIOSHID(NewSiteID)
> 
> 0.6954
> 
> Residual
> 
> 2.5372
> 
> 
> Fit Statistics
> 
> -2 Res Log Likelihood
> 
> 961.8
> 
> AIC (Smaller is Better)
> 
> 965.8
> 
> AICC (Smaller is Better)
> 
> 965.9
> 
> BIC (Smaller is Better)
> 
> 967.3
> 
> 
> Solution for Fixed Effects
> 
> Effect
> 
> F_mass_handled_or
> 
> Estimate
> 
> Standard
> Error
> 
> DF
> 
> t Value
> 
> Pr > |t|
> 
> Intercept
> 
> 
> 
> -139.29
> 
> 83.6162
> 
> 92.1
> 
> -1.67
> 
> 0.0991
> 
> F_High_Exp
> 
> 
> 
> -180.24
> 
> 102.72
> 
> 92.6
> 
> -1.75
> 
> 0.0826
> 
> F_Mat_Type_SW
> 
> 
> 
> 470.30
> 
> 261.53
> 
> 92.1
> 
> 1.80
> 
> 0.0754
> 
> F_Mat_Type_CNF
> 
> 
> 
> -636.35
> 
> 347.99
> 
> 92.1
> 
> -1.83
> 
> 0.0707
> 
> F_Mat_Form_Dry
> 
> 
> 
> 662.26
> 
> 368.75
> 
> 92
> 
> 1.80
> 
> 0.0758
> 
> F_Mat_Form_Liq
> 
> 
> 
> -583.14
> 
> 318.54
> 
> 92
> 
> -1.83
> 
> 0.0704
> 
> F_Mat_Form_Comp
> 
> 
> 
> 598.77
> 
> 331.85
> 
> 92
> 
> 1.80
> 
> 0.0745
> 
> F_Hybrid
> 
> 
> 
> -1197.69
> 
> 658.23
> 
> 92
> 
> -1.82
> 
> 0.0721
> 
> F_Primary
> 
> 
> 
> -639.93
> 
> 352.80
> 
> 92
> 
> -1.81
> 
> 0.0730
> 
> F_Coatings
> 
> 
> 
> 92.7949
> 
> 50.7416
> 
> 92.1
> 
> 1.83
> 
> 0.0707
> 
> F_mass_handled_or
> 
> F2
> 
> 134.91
> 
> 77.2496
> 
> 92
> 
> 1.75
> 
> 0.0841
> 
> F_mass_handled_or
> 
> F3
> 
> -1235.37
> 
> 677.42
> 
> 92
> 
> -1.82
> 
> 0.0715
> 
> F_mass_handled_or
> 
> F4
> 
> 137.71
> 
> 75.9370
> 
> 92
> 
> 1.81
> 
> 0.0730
> 
> F_mass_handled_or
> 
> F1
> 
> 0
> 
> .
> 
> .
> 
> .
> 
> .
> 
> F_adequate
> 
> 
> 
> 139.45
> 
> 76.5607
> 
> 92
> 
> 1.82
> 
> 0.0718
> 
> F_inadequate
> 
> 
> 
> 1395.04
> 
> 767.40
> 
> 92
> 
> 1.82
> 
> 0.0723
> 
> F_emp_sc
> 
> 
> 
> -1259.21
> 
> 691.98
> 
> 92
> 
> -1.82
> 
> 0.0721
> 
> F_diam_sc
> 
> 
> 
> -20.1875
> 
> 9.9710
> 
> 89.5
> 
> -2.02
> 
> 0.0459
> 
> 
> mixedidsite <- lmer(ln_i ~ (1 | NewSiteID/NIOSHID)  + F_High_Exp + F_Mat_Type_SW +
>                     F_Mat_Type_CNF + F_Mat_Form_Dry + F_Mat_Form_Liq + F_Mat_Form_Comp +
>                     F_Hybrid + F_Primary + F_Coatings + F_adequate +
>                     F_inadequate + F_emp_sc + F_diam_sc,
>                   data = Modeling_Database_Final)
> summary(mixedidsite)
> 
> REML criterion at convergence: 961.8
> 
> Scaled residuals:
>   Min      1Q  Median      3Q     Max
> -2.3019 -0.5248 -0.1668  0.3511  4.7091
> 
> Random effects:
> Groups                              Name        Variance Std.Dev.
> NIOSHID:NewSiteID       (Intercept) 0.6954   0.8339
> NewSiteID                        (Intercept) 1.5932   1.2622
> Residual                                                2.5372   1.5929
> Number of obs: 252, groups:  NIOSHID:NewSiteID, 127; NewSiteID, 15
> 
> Fixed effects:
>                                                          Estimate Std. Error t value
> (Intercept)                                       -139.292     83.702  -1.664
> F_High_Exp                                      -180.239    102.778  -1.754
> F_Mat_Type_SW                            470.297    261.537   1.798
> F_Mat_Type_CNF                           -636.352    347.993  -1.829
> F_Mat_Form_Dry                            662.263    368.761   1.796
> F_Mat_Form_Liq                            -583.142    318.541  -1.831
> F_Mat_Form_Comp                       598.774    331.856   1.804
> F_Hybrid                                           -1197.691    658.229  -1.820
> F_Primary                                        -639.928    352.811  -1.814
> F_Coatings                                       92.795     50.804   1.826
> F_mass_handled_orF2: 10 -         134.912     77.291   1.746
> F_mass_handled_orF3: 101         -1235.372    677.423  -1.824
> F_mass_handled_orF4: >1 k        137.714     75.958   1.813
> F_adequate                                     139.446     76.582   1.821
> F_inadequate                                  1395.036    767.413   1.818
> F_emp_sc                                         -1259.213    691.979  -1.820
> F_diam_sc                                        -20.188      9.971  -2.025
> 
> Again, the SAS results make sense...that there is 0 variance left over from the fully specified fixed effects. I am fairly certain the combination of the fixed effects uniquely identifies each facility. However, why doesn't R give a 0 variance? What is different between the two methods?
> 
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> 
> 
> This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>



More information about the R-sig-mixed-models mailing list