[R-sig-ME] Mixed Models in SAS and R

Bertke, Stephen (CDC/NIOSH/DSHEFS) inh4 at cdc.gov
Wed Feb 14 17:07:34 CET 2018


Hello everyone. I have just joined this mailing list so I wanted to introduce myself as well as ask a question.

My name is Steve Bertke and I am a researcher at the National Institute for Occupational Safety and Health (NIOSH) which is one of the centers within the Centers for Disease Control and Prevention (CDC). I am a long-time SAS user but have been slowly finding myself using and liking R. I have found the support community for R very helpful and rewarding and am looking forward to contributing my part.

On to my question.

I have ran the same model (I think) in both SAS (proc mixed) and R (lmer) but have gotten different results for the random terms.

A quick background, we took 2 personal air samples for 127 people at 15 different factories for a total of 252 samples (after dropping 2 samples). We are trying to model various factors of the factory on the air samples. To do so, we need to control for the repeated measure of the person (nested within factory) as well as the random factory effect.

You can see below for the exact code and exact results, but in short, when I run the models in R and SAS without any fixed effects, I get the same results for the random effects. When I enter in the fixed effects, I get a different result for the Site variance...but all other results are the same (both the fixed effects and the other random effects).

There is another methodological issue with the model I am running. I am entering in 17 fixed effects that describe the 15 sites. Therefore, the model is over-specified. As a result, SAS gives a variance estimate of 0 for site (which, in hindsight, makes sense) however, R does not. That is a separate issue that we are dealing with, but I would expect that R and SAS would give the same result. Or maybe at least a warning.

Below is my code and output. I hope the formatting remains so that it is easily readable for everyone. I may also be able to share the data too, but I need to get some approval for that first.

Details
I ran the following code in SAS and R without any fixed effects and both give the same results:

proc mixed data=dat;
class NewSiteID NIOSHID;
model ln_i = ;
random NewSiteID NIOSHID(NewSiteID);
run;

Covariance Parameter Estimates

Cov Parm

Estimate

NewSiteID

6.3433

NIOSHID(NewSiteID)

0.7465

Residual

2.5256


mixedidsite <- lmer(ln_i ~ (1 | NewSiteID/NIOSHID),
                    data = Modeling_Database_Final)
summary(mixedidsite)

Random effects:
Groups                              Name        Variance Std.Dev.
NIOSHID:NewSiteID       (Intercept) 0.7465   0.864
 NewSiteID                        (Intercept) 6.3434   2.519
 Residual                                                 2.5256   1.589

However, when I add in the fixed effects, I get different results. SAS gives an estimate of 0 for Site while R does not. All other results are the same:

proc mixed data=dat;
class NewSiteID NIOSHID F_mass_handled_or (ref=first);
model ln_i = F_High_Exp
F_Mat_Type_SW
F_Mat_Type_CNF
F_Mat_Form_Dry
F_Mat_Form_Liq
F_Mat_Form_Comp
F_Hybrid
F_Primary
F_Coatings
F_mass_handled_or
F_adequate
F_inadequate
F_emp_sc
F_diam_sc/solution;
random NewSiteID NIOSHID(NewSiteID);
run;

Covariance Parameter Estimates

Cov Parm

Estimate

NewSiteID

0

NIOSHID(NewSiteID)

0.6954

Residual

2.5372


Fit Statistics

-2 Res Log Likelihood

961.8

AIC (Smaller is Better)

965.8

AICC (Smaller is Better)

965.9

BIC (Smaller is Better)

967.3


Solution for Fixed Effects

Effect

F_mass_handled_or

Estimate

Standard
Error

DF

t Value

Pr > |t|

Intercept



-139.29

83.6162

92.1

-1.67

0.0991

F_High_Exp



-180.24

102.72

92.6

-1.75

0.0826

F_Mat_Type_SW



470.30

261.53

92.1

1.80

0.0754

F_Mat_Type_CNF



-636.35

347.99

92.1

-1.83

0.0707

F_Mat_Form_Dry



662.26

368.75

92

1.80

0.0758

F_Mat_Form_Liq



-583.14

318.54

92

-1.83

0.0704

F_Mat_Form_Comp



598.77

331.85

92

1.80

0.0745

F_Hybrid



-1197.69

658.23

92

-1.82

0.0721

F_Primary



-639.93

352.80

92

-1.81

0.0730

F_Coatings



92.7949

50.7416

92.1

1.83

0.0707

F_mass_handled_or

F2

134.91

77.2496

92

1.75

0.0841

F_mass_handled_or

F3

-1235.37

677.42

92

-1.82

0.0715

F_mass_handled_or

F4

137.71

75.9370

92

1.81

0.0730

F_mass_handled_or

F1

0

.

.

.

.

F_adequate



139.45

76.5607

92

1.82

0.0718

F_inadequate



1395.04

767.40

92

1.82

0.0723

F_emp_sc



-1259.21

691.98

92

-1.82

0.0721

F_diam_sc



-20.1875

9.9710

89.5

-2.02

0.0459


mixedidsite <- lmer(ln_i ~ (1 | NewSiteID/NIOSHID)  + F_High_Exp + F_Mat_Type_SW +
                      F_Mat_Type_CNF + F_Mat_Form_Dry + F_Mat_Form_Liq + F_Mat_Form_Comp +
                      F_Hybrid + F_Primary + F_Coatings + F_adequate +
                      F_inadequate + F_emp_sc + F_diam_sc,
                    data = Modeling_Database_Final)
summary(mixedidsite)

REML criterion at convergence: 961.8

Scaled residuals:
    Min      1Q  Median      3Q     Max
-2.3019 -0.5248 -0.1668  0.3511  4.7091

Random effects:
Groups                              Name        Variance Std.Dev.
NIOSHID:NewSiteID       (Intercept) 0.6954   0.8339
 NewSiteID                        (Intercept) 1.5932   1.2622
 Residual                                                2.5372   1.5929
Number of obs: 252, groups:  NIOSHID:NewSiteID, 127; NewSiteID, 15

Fixed effects:
                                                           Estimate Std. Error t value
(Intercept)                                       -139.292     83.702  -1.664
F_High_Exp                                      -180.239    102.778  -1.754
F_Mat_Type_SW                            470.297    261.537   1.798
F_Mat_Type_CNF                           -636.352    347.993  -1.829
F_Mat_Form_Dry                            662.263    368.761   1.796
F_Mat_Form_Liq                            -583.142    318.541  -1.831
F_Mat_Form_Comp                       598.774    331.856   1.804
F_Hybrid                                           -1197.691    658.229  -1.820
F_Primary                                        -639.928    352.811  -1.814
F_Coatings                                       92.795     50.804   1.826
F_mass_handled_orF2: 10 -         134.912     77.291   1.746
F_mass_handled_orF3: 101         -1235.372    677.423  -1.824
F_mass_handled_orF4: >1 k        137.714     75.958   1.813
F_adequate                                     139.446     76.582   1.821
F_inadequate                                  1395.036    767.413   1.818
F_emp_sc                                         -1259.213    691.979  -1.820
F_diam_sc                                        -20.188      9.971  -2.025

Again, the SAS results make sense...that there is 0 variance left over from the fully specified fixed effects. I am fairly certain the combination of the fixed effects uniquely identifies each facility. However, why doesn't R give a 0 variance? What is different between the two methods?


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list