[R-sig-ME] gls() results for unbalanced data with corStruct correlation class change if Initialize() is called first
Mychaleckyj, Josyf C (jcm6t)
jcm6t @end|ng |rom v|rg|n|@@edu
Thu Jul 16 17:21:37 CEST 2020
I’ve encountered a strange situation where the results of a gls model with a corStruct correlation structure are affected by whether Initialize() is explicitly called before the gls() model is executed.
This seems to be restricted to unbalanced data sets. I have pasted a test case using the Orthodont data set and corCompSymm below, although I first noticed it while running a custom corStruct class so it would appear to be a more general problem than this test case as you might expect. I have not tested this in lme().
I’m aware that calling Initialize() outside of gls or lme is generally discouraged but was surprised that it changed the analysis results. You could imagine calling it in an interactive modeling session to view the correlation structure without knowing that you might have changed future results. I would have expected the results to be identical.
I’d also like to know - which is the “correct” result or are they considered practically equivalent, although the differences seem quite large for the latter.
Just to the clear, this is not about the relative merits of balanced vs unbalanced data. Its about the veracity of the results.
Thanks,
Joe.
R version 3.6.0 (Linux CentOS 7)
> library(nlme)
>
> packageVersion('nlme')
[1] ‘3.1.147’
>
> #—————————————————————
> # the values are fixed for all models
> rho <- 0.3
> fixform <- distance ~ age + factor(Sex)
>
> # remove the grouping and other classes to keep it simple
> # full balanced data set
> Obal<-as.data.frame(Orthodont)
> class(Obal)
[1] "data.frame"
>
> # create unbalanced Orthodont data set with no singletons
> Ounbal <-as.data.frame(Orthodont[c(3,4,5, 7,8, 11, 12, 15, 16, 19, 20:28,31,32,35,36,39,
+ 40,41,43,44,47,48,51,52,53,55,56,59,60,63,64,67,68,71,72,75,76,
+ 79,80,83:85,87,88,91,92,95,96,99:101,103:105,107,108),])
>
> # sparser unbalanced data set with singletons
> Ounbal2<-as.data.frame(Orthodont[c(1,5,9,13,17,18,21:23,30:31,44,45:47,56:58,65:71,74:76,77,81,85,89,90,
+ 93:94, 97,101:108),])
>
> Subjs<-c(paste0(c("M"),sprintf("%02d",1:16)), paste0(c("F"),sprintf("%02d",1:11)))
>
> table(Obal$Subject)[Subjs]
M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 F01 F02 F03 F04
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
F05 F06 F07 F08 F09 F10 F11
4 4 4 4 4 4 4
> table(Ounbal$Subject)[Subjs]
M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 F01 F02 F03 F04
2 3 2 2 2 4 4 2 2 2 3 2 2 3 2 2 2 2 2 2
F05 F06 F07 F08 F09 F10 F11
2 3 2 2 2 3 3
> table(Ounbal2$Subject)[Subjs]
M01 M02 M03 M04 M05 M06 <NA> M08 <NA> <NA> M11 M12 <NA> M14 M15 <NA>
1 1 1 1 2 3 2 1 3 1 2
F01 F02 F03 F04 F05 F06 F07 F08 F09 F10 F11
4 3 3 1 1 1 2 2 1 4 4
>
> #----
> # test with the full balanced data set
> # pre-calling Initialize doesn't seem to make a difference
> csB <- corCompSymm(value = rho, form = ~ 1 | Subject)
> csB<-Initialize(csB, data=Obal)
>
> print(mod<-gls(fixform, correlation=csB, data=Obal))
Generalized least squares fit by REML
Model: fixform
Data: Obal
Log-restricted-likelihood: -218.7563
Coefficients:
(Intercept) age SexFemale
17.7067130 0.6601852 -2.3210227
Correlation Structure: Compound symmetry
Formula: ~1 | Subject
Parameter estimate(s):
Rho
0.6144908
Degrees of freedom: 108 total; 105 residual
Residual standard error: 2.305696
>
> # compare with direct call
> print(mod2<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Obal) )
Generalized least squares fit by REML
Model: fixform
Data: Obal
Log-restricted-likelihood: -218.7563
Coefficients:
(Intercept) age SexFemale
17.7067130 0.6601852 -2.3210227
Correlation Structure: Compound symmetry
Formula: ~1 | Subject
Parameter estimate(s):
Rho
0.6144908
Degrees of freedom: 108 total; 105 residual
Residual standard error: 2.305696
>
> #--------------
> # compare Unbalanced:
> csU <- corCompSymm(value = rho, form = ~ 1 | Subject)
> csU<-Initialize(csU, data=Ounbal)
>
> print(mod3<-gls(fixform, correlation=csU, data=Ounbal) )
Generalized least squares fit by REML
Model: fixform
Data: Ounbal
Log-restricted-likelihood: -130.7703
Coefficients:
(Intercept) age SexFemale
18.046228 0.659606 -3.262389
Correlation Structure: Compound symmetry
Formula: ~1 | Subject
Parameter estimate(s):
Rho
0.7211814
Degrees of freedom: 64 total; 61 residual
Residual standard error: 2.378517
>
> # compare with direct call
> print(mod4<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Ounbal) )
Generalized least squares fit by REML
Model: fixform
Data: Ounbal
Log-restricted-likelihood: -127.616
Coefficients:
(Intercept) age SexFemale
18.330637 0.634451 -2.947638
Correlation Structure: Compound symmetry
Formula: ~1 | Subject
Parameter estimate(s):
Rho
0.7598463
Degrees of freedom: 64 total; 61 residual
Residual standard error: 2.34705
>
> #----
> # compare with Unbalanced 2 with singletons
> csU2 <- corCompSymm(value = rho, form = ~ 1 | Subject)
> csU2<-Initialize(csU2, data=Ounbal2)
>
> print(mod5<-gls(fixform, correlation=csU2, data=Ounbal2))
Generalized least squares fit by REML
Model: fixform
Data: Ounbal2
Log-restricted-likelihood: -74.02154
Coefficients:
(Intercept) age SexFemale
18.5018545 0.5265219 -1.4587968
Correlation Structure: Compound symmetry
Formula: ~1 | Subject
Parameter estimate(s):
Rho
0.8736732
Degrees of freedom: 44 total; 41 residual
Residual standard error: 1.94714
>
> # compare with direct call
> print(mod6<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Ounbal2))
Generalized least squares fit by REML
Model: fixform
Data: Ounbal2
Log-restricted-likelihood: -77.29477
Coefficients:
(Intercept) age SexFemale
18.4979634 0.5550817 -1.6893916
Correlation Structure: Compound symmetry
Formula: ~1 | Subject
Parameter estimate(s):
Rho
0.8516777
Degrees of freedom: 44 total; 41 residual
Residual standard error: 2.024012
>
More information about the R-sig-mixed-models
mailing list