[R-sig-ME] Mixed Models in SAS and R

Thierry Onkelinx thierry.onkelinx at inbo.be
Wed Feb 14 17:38:41 CET 2018


Dear Stephen,

The list removes all HTML formating, making your post hard to read.
Please use only plain text when posting.

You'll need to make sure that you fit exactly the same model in SAS as
in R. Not everyone here speaks SAS. Providing the math equation for
the SAS model would help.

Also please elaborate why it makes sense that the variance of site
should be zero. We cannot verify that statement based on the
information you provide.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




2018-02-14 17:07 GMT+01:00 Bertke, Stephen (CDC/NIOSH/DSHEFS) <inh4 at cdc.gov>:
> Hello everyone. I have just joined this mailing list so I wanted to introduce myself as well as ask a question.
>
> My name is Steve Bertke and I am a researcher at the National Institute for Occupational Safety and Health (NIOSH) which is one of the centers within the Centers for Disease Control and Prevention (CDC). I am a long-time SAS user but have been slowly finding myself using and liking R. I have found the support community for R very helpful and rewarding and am looking forward to contributing my part.
>
> On to my question.
>
> I have ran the same model (I think) in both SAS (proc mixed) and R (lmer) but have gotten different results for the random terms.
>
> A quick background, we took 2 personal air samples for 127 people at 15 different factories for a total of 252 samples (after dropping 2 samples). We are trying to model various factors of the factory on the air samples. To do so, we need to control for the repeated measure of the person (nested within factory) as well as the random factory effect.
>
> You can see below for the exact code and exact results, but in short, when I run the models in R and SAS without any fixed effects, I get the same results for the random effects. When I enter in the fixed effects, I get a different result for the Site variance...but all other results are the same (both the fixed effects and the other random effects).
>
> There is another methodological issue with the model I am running. I am entering in 17 fixed effects that describe the 15 sites. Therefore, the model is over-specified. As a result, SAS gives a variance estimate of 0 for site (which, in hindsight, makes sense) however, R does not. That is a separate issue that we are dealing with, but I would expect that R and SAS would give the same result. Or maybe at least a warning.
>
> Below is my code and output. I hope the formatting remains so that it is easily readable for everyone. I may also be able to share the data too, but I need to get some approval for that first.
>
> Details
> I ran the following code in SAS and R without any fixed effects and both give the same results:
>
> proc mixed data=dat;
> class NewSiteID NIOSHID;
> model ln_i = ;
> random NewSiteID NIOSHID(NewSiteID);
> run;
>
> Covariance Parameter Estimates
>
> Cov Parm
>
> Estimate
>
> NewSiteID
>
> 6.3433
>
> NIOSHID(NewSiteID)
>
> 0.7465
>
> Residual
>
> 2.5256
>
>
> mixedidsite <- lmer(ln_i ~ (1 | NewSiteID/NIOSHID),
>                     data = Modeling_Database_Final)
> summary(mixedidsite)
>
> Random effects:
> Groups                              Name        Variance Std.Dev.
> NIOSHID:NewSiteID       (Intercept) 0.7465   0.864
>  NewSiteID                        (Intercept) 6.3434   2.519
>  Residual                                                 2.5256   1.589
>
> However, when I add in the fixed effects, I get different results. SAS gives an estimate of 0 for Site while R does not. All other results are the same:
>
> proc mixed data=dat;
> class NewSiteID NIOSHID F_mass_handled_or (ref=first);
> model ln_i = F_High_Exp
> F_Mat_Type_SW
> F_Mat_Type_CNF
> F_Mat_Form_Dry
> F_Mat_Form_Liq
> F_Mat_Form_Comp
> F_Hybrid
> F_Primary
> F_Coatings
> F_mass_handled_or
> F_adequate
> F_inadequate
> F_emp_sc
> F_diam_sc/solution;
> random NewSiteID NIOSHID(NewSiteID);
> run;
>
> Covariance Parameter Estimates
>
> Cov Parm
>
> Estimate
>
> NewSiteID
>
> 0
>
> NIOSHID(NewSiteID)
>
> 0.6954
>
> Residual
>
> 2.5372
>
>
> Fit Statistics
>
> -2 Res Log Likelihood
>
> 961.8
>
> AIC (Smaller is Better)
>
> 965.8
>
> AICC (Smaller is Better)
>
> 965.9
>
> BIC (Smaller is Better)
>
> 967.3
>
>
> Solution for Fixed Effects
>
> Effect
>
> F_mass_handled_or
>
> Estimate
>
> Standard
> Error
>
> DF
>
> t Value
>
> Pr > |t|
>
> Intercept
>
>
>
> -139.29
>
> 83.6162
>
> 92.1
>
> -1.67
>
> 0.0991
>
> F_High_Exp
>
>
>
> -180.24
>
> 102.72
>
> 92.6
>
> -1.75
>
> 0.0826
>
> F_Mat_Type_SW
>
>
>
> 470.30
>
> 261.53
>
> 92.1
>
> 1.80
>
> 0.0754
>
> F_Mat_Type_CNF
>
>
>
> -636.35
>
> 347.99
>
> 92.1
>
> -1.83
>
> 0.0707
>
> F_Mat_Form_Dry
>
>
>
> 662.26
>
> 368.75
>
> 92
>
> 1.80
>
> 0.0758
>
> F_Mat_Form_Liq
>
>
>
> -583.14
>
> 318.54
>
> 92
>
> -1.83
>
> 0.0704
>
> F_Mat_Form_Comp
>
>
>
> 598.77
>
> 331.85
>
> 92
>
> 1.80
>
> 0.0745
>
> F_Hybrid
>
>
>
> -1197.69
>
> 658.23
>
> 92
>
> -1.82
>
> 0.0721
>
> F_Primary
>
>
>
> -639.93
>
> 352.80
>
> 92
>
> -1.81
>
> 0.0730
>
> F_Coatings
>
>
>
> 92.7949
>
> 50.7416
>
> 92.1
>
> 1.83
>
> 0.0707
>
> F_mass_handled_or
>
> F2
>
> 134.91
>
> 77.2496
>
> 92
>
> 1.75
>
> 0.0841
>
> F_mass_handled_or
>
> F3
>
> -1235.37
>
> 677.42
>
> 92
>
> -1.82
>
> 0.0715
>
> F_mass_handled_or
>
> F4
>
> 137.71
>
> 75.9370
>
> 92
>
> 1.81
>
> 0.0730
>
> F_mass_handled_or
>
> F1
>
> 0
>
> .
>
> .
>
> .
>
> .
>
> F_adequate
>
>
>
> 139.45
>
> 76.5607
>
> 92
>
> 1.82
>
> 0.0718
>
> F_inadequate
>
>
>
> 1395.04
>
> 767.40
>
> 92
>
> 1.82
>
> 0.0723
>
> F_emp_sc
>
>
>
> -1259.21
>
> 691.98
>
> 92
>
> -1.82
>
> 0.0721
>
> F_diam_sc
>
>
>
> -20.1875
>
> 9.9710
>
> 89.5
>
> -2.02
>
> 0.0459
>
>
> mixedidsite <- lmer(ln_i ~ (1 | NewSiteID/NIOSHID)  + F_High_Exp + F_Mat_Type_SW +
>                       F_Mat_Type_CNF + F_Mat_Form_Dry + F_Mat_Form_Liq + F_Mat_Form_Comp +
>                       F_Hybrid + F_Primary + F_Coatings + F_adequate +
>                       F_inadequate + F_emp_sc + F_diam_sc,
>                     data = Modeling_Database_Final)
> summary(mixedidsite)
>
> REML criterion at convergence: 961.8
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -2.3019 -0.5248 -0.1668  0.3511  4.7091
>
> Random effects:
> Groups                              Name        Variance Std.Dev.
> NIOSHID:NewSiteID       (Intercept) 0.6954   0.8339
>  NewSiteID                        (Intercept) 1.5932   1.2622
>  Residual                                                2.5372   1.5929
> Number of obs: 252, groups:  NIOSHID:NewSiteID, 127; NewSiteID, 15
>
> Fixed effects:
>                                                            Estimate Std. Error t value
> (Intercept)                                       -139.292     83.702  -1.664
> F_High_Exp                                      -180.239    102.778  -1.754
> F_Mat_Type_SW                            470.297    261.537   1.798
> F_Mat_Type_CNF                           -636.352    347.993  -1.829
> F_Mat_Form_Dry                            662.263    368.761   1.796
> F_Mat_Form_Liq                            -583.142    318.541  -1.831
> F_Mat_Form_Comp                       598.774    331.856   1.804
> F_Hybrid                                           -1197.691    658.229  -1.820
> F_Primary                                        -639.928    352.811  -1.814
> F_Coatings                                       92.795     50.804   1.826
> F_mass_handled_orF2: 10 -         134.912     77.291   1.746
> F_mass_handled_orF3: 101         -1235.372    677.423  -1.824
> F_mass_handled_orF4: >1 k        137.714     75.958   1.813
> F_adequate                                     139.446     76.582   1.821
> F_inadequate                                  1395.036    767.413   1.818
> F_emp_sc                                         -1259.213    691.979  -1.820
> F_diam_sc                                        -20.188      9.971  -2.025
>
> Again, the SAS results make sense...that there is 0 variance left over from the fully specified fixed effects. I am fairly certain the combination of the fixed effects uniquely identifies each facility. However, why doesn't R give a 0 variance? What is different between the two methods?
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list