[R-sig-ME] gls() results for unbalanced data with corStruct correlation class change if Initialize() is called first

Mychaleckyj, Josyf C (jcm6t) jcm6t @end|ng |rom v|rg|n|@@edu
Thu Jul 16 17:21:37 CEST 2020


I’ve encountered a strange situation where the results of a gls model with a corStruct correlation structure are affected by whether Initialize() is explicitly called before the gls() model is executed.

This seems to be restricted to unbalanced data sets.  I have pasted a test case using the Orthodont data set and corCompSymm below, although I first noticed it while running a custom corStruct class so it would appear to be a more general problem than this test case as you might expect. I have not tested this in lme().

I’m aware that calling Initialize() outside of gls or lme is generally discouraged but was surprised that it changed the analysis results. You could imagine calling it in an interactive modeling session to view the correlation structure without knowing that you might have changed future results. I would have expected the results to be identical.

I’d also like to know - which is the “correct” result or are they considered practically equivalent, although the differences seem quite large for the latter.

Just to the clear, this is not about the relative merits of balanced vs unbalanced data. Its about the veracity of the results.

Thanks,
Joe.



R version 3.6.0 (Linux CentOS 7)

> library(nlme)
>
> packageVersion('nlme')
[1] ‘3.1.147’
>
> #—————————————————————
> # the values are fixed for all models
> rho <- 0.3
> fixform <- distance ~ age + factor(Sex)
>
> # remove the grouping and other classes to keep it simple
> # full balanced data set
> Obal<-as.data.frame(Orthodont)
> class(Obal)
[1] "data.frame"
>
> # create unbalanced Orthodont data set with no singletons
> Ounbal <-as.data.frame(Orthodont[c(3,4,5, 7,8, 11, 12, 15, 16, 19, 20:28,31,32,35,36,39,
+ 40,41,43,44,47,48,51,52,53,55,56,59,60,63,64,67,68,71,72,75,76,
+ 79,80,83:85,87,88,91,92,95,96,99:101,103:105,107,108),])
>
> # sparser unbalanced data set with singletons
> Ounbal2<-as.data.frame(Orthodont[c(1,5,9,13,17,18,21:23,30:31,44,45:47,56:58,65:71,74:76,77,81,85,89,90,
+  93:94, 97,101:108),])
>
> Subjs<-c(paste0(c("M"),sprintf("%02d",1:16)), paste0(c("F"),sprintf("%02d",1:11)))
>
> table(Obal$Subject)[Subjs]

M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 F01 F02 F03 F04
  4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4
F05 F06 F07 F08 F09 F10 F11
  4   4   4   4   4   4   4
> table(Ounbal$Subject)[Subjs]

M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 F01 F02 F03 F04
  2   3   2   2   2   4   4   2   2   2   3   2   2   3   2   2   2   2   2   2
F05 F06 F07 F08 F09 F10 F11
  2   3   2   2   2   3   3
> table(Ounbal2$Subject)[Subjs]

 M01  M02  M03  M04  M05  M06 <NA>  M08 <NA> <NA>  M11  M12 <NA>  M14  M15 <NA>
   1    1    1    1    2    3         2              1    3         1    2
 F01  F02  F03  F04  F05  F06  F07  F08  F09  F10  F11
   4    3    3    1    1    1    2    2    1    4    4
>
> #----
> # test with the full balanced data set
> # pre-calling Initialize doesn't seem to make a difference
> csB <- corCompSymm(value = rho, form = ~ 1 | Subject)
> csB<-Initialize(csB, data=Obal)
>
> print(mod<-gls(fixform, correlation=csB, data=Obal))
Generalized least squares fit by REML
  Model: fixform
  Data: Obal
  Log-restricted-likelihood: -218.7563

Coefficients:
(Intercept)         age   SexFemale
 17.7067130   0.6601852  -2.3210227

Correlation Structure: Compound symmetry
 Formula: ~1 | Subject
 Parameter estimate(s):
      Rho
0.6144908
Degrees of freedom: 108 total; 105 residual
Residual standard error: 2.305696
>
> # compare with direct call
> print(mod2<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Obal) )
Generalized least squares fit by REML
  Model: fixform
  Data: Obal
  Log-restricted-likelihood: -218.7563

Coefficients:
(Intercept)         age   SexFemale
 17.7067130   0.6601852  -2.3210227

Correlation Structure: Compound symmetry
 Formula: ~1 | Subject
 Parameter estimate(s):
      Rho
0.6144908
Degrees of freedom: 108 total; 105 residual
Residual standard error: 2.305696
>
> #--------------
> # compare Unbalanced:
> csU <- corCompSymm(value = rho, form = ~ 1 | Subject)
> csU<-Initialize(csU, data=Ounbal)
>
> print(mod3<-gls(fixform, correlation=csU, data=Ounbal) )
Generalized least squares fit by REML
  Model: fixform
  Data: Ounbal
  Log-restricted-likelihood: -130.7703

Coefficients:
(Intercept)         age   SexFemale
  18.046228    0.659606   -3.262389

Correlation Structure: Compound symmetry
 Formula: ~1 | Subject
 Parameter estimate(s):
      Rho
0.7211814
Degrees of freedom: 64 total; 61 residual
Residual standard error: 2.378517
>
> # compare with direct call
> print(mod4<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Ounbal) )
Generalized least squares fit by REML
  Model: fixform
  Data: Ounbal
  Log-restricted-likelihood: -127.616

Coefficients:
(Intercept)         age   SexFemale
  18.330637    0.634451   -2.947638

Correlation Structure: Compound symmetry
 Formula: ~1 | Subject
 Parameter estimate(s):
      Rho
0.7598463
Degrees of freedom: 64 total; 61 residual
Residual standard error: 2.34705
>
> #----
> # compare with Unbalanced 2 with singletons
> csU2 <- corCompSymm(value = rho, form = ~ 1 | Subject)
> csU2<-Initialize(csU2, data=Ounbal2)
>
> print(mod5<-gls(fixform, correlation=csU2, data=Ounbal2))
Generalized least squares fit by REML
  Model: fixform
  Data: Ounbal2
  Log-restricted-likelihood: -74.02154

Coefficients:
(Intercept)         age   SexFemale
 18.5018545   0.5265219  -1.4587968

Correlation Structure: Compound symmetry
 Formula: ~1 | Subject
 Parameter estimate(s):
      Rho
0.8736732
Degrees of freedom: 44 total; 41 residual
Residual standard error: 1.94714
>
> # compare with direct call
> print(mod6<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Ounbal2))
Generalized least squares fit by REML
  Model: fixform
  Data: Ounbal2
  Log-restricted-likelihood: -77.29477

Coefficients:
(Intercept)         age   SexFemale
 18.4979634   0.5550817  -1.6893916

Correlation Structure: Compound symmetry
 Formula: ~1 | Subject
 Parameter estimate(s):
      Rho
0.8516777
Degrees of freedom: 44 total; 41 residual
Residual standard error: 2.024012
>



More information about the R-sig-mixed-models mailing list