[R-sig-ME] gls() results for unbalanced data with corStruct correlation class change if Initialize() is called first

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Sat Jul 18 22:19:02 CEST 2020


[keeping r-sig-mixed-models in Cc: in case anyone wants to follow along]

I think this is worth a bug report. From https://www.r-project.org/bugs.html

 > In order to get a bugzilla account (i.e., become “member”), please 
send an e-mail (from the address you want to use as your login) to 
|bug-report-request using r-project.org| briefly explaining why, and a 
volunteer will add you to R’s Bugzilla members.

   There are lots of warnings on the page about doing your due diligence 
before reporting a bug, but I think you have ...

   While this is definitely worth reporting, I don't know if it will get 
fixed quickly. It's my impression that nlme is semi-orphaned - i.e. it's 
getting patched to maintain compatibility with changes in R, and some 
cleanup/technical debt payoff, but there are e.g. 14 bugs reported at 
https://bugs.r-project.org/bugzilla/buglist.cgi?quicksearch=lme&list_id=14586

See also https://svn.r-project.org/R-packages/trunk/nlme/ for 
development status/pace.

   cheers

     Ben Bolker


On 7/18/20 10:53 AM, Mychaleckyj, Josyf C (jcm6t) wrote:
> Ben,
> Thanks for your response. This caused no end of problems in our custom corStruct class because we assumed it was in our custom code but we could not see how. It’s even more puzzling because the first statement in Initialize.corCompSymm should explicitly handle repeated calls to the function through testing the “inf” attribute.
>
> What is the protocol now ? Should I file this ? I don’t have an R Bugzilla account.
>
> Thanks,  Joe.
>
>> On Jul 16, 2020, at 11:21 AM, Mychaleckyj, Josyf C (jcm6t) <jcm6t using virginia.edu> wrote:
>>
>>
>> I’ve encountered a strange situation where the results of a gls model with a corStruct correlation structure are affected by whether Initialize() is explicitly called before the gls() model is executed.
>>
>> This seems to be restricted to unbalanced data sets.  I have pasted a test case using the Orthodont data set and corCompSymm below, although I first noticed it while running a custom corStruct class so it would appear to be a more general problem than this test case as you might expect. I have not tested this in lme().
>>
>> I’m aware that calling Initialize() outside of gls or lme is generally discouraged but was surprised that it changed the analysis results. You could imagine calling it in an interactive modeling session to view the correlation structure without knowing that you might have changed future results. I would have expected the results to be identical.
>>
>> I’d also like to know - which is the “correct” result or are they considered practically equivalent, although the differences seem quite large for the latter.
>>
>> Just to the clear, this is not about the relative merits of balanced vs unbalanced data. Its about the veracity of the results.
>>
>> Thanks,
>> Joe.
>>
>>
>>
>> R version 3.6.0 (Linux CentOS 7)
>>
>>> library(nlme)
>>>
>>> packageVersion('nlme')
>> [1] ‘3.1.147’
>>> #—————————————————————
>>> # the values are fixed for all models
>>> rho <- 0.3
>>> fixform <- distance ~ age + factor(Sex)
>>>
>>> # remove the grouping and other classes to keep it simple
>>> # full balanced data set
>>> Obal<-as.data.frame(Orthodont)
>>> class(Obal)
>> [1] "data.frame"
>>> # create unbalanced Orthodont data set with no singletons
>>> Ounbal <-as.data.frame(Orthodont[c(3,4,5, 7,8, 11, 12, 15, 16, 19, 20:28,31,32,35,36,39,
>> + 40,41,43,44,47,48,51,52,53,55,56,59,60,63,64,67,68,71,72,75,76,
>> + 79,80,83:85,87,88,91,92,95,96,99:101,103:105,107,108),])
>>> # sparser unbalanced data set with singletons
>>> Ounbal2<-as.data.frame(Orthodont[c(1,5,9,13,17,18,21:23,30:31,44,45:47,56:58,65:71,74:76,77,81,85,89,90,
>> +  93:94, 97,101:108),])
>>> Subjs<-c(paste0(c("M"),sprintf("%02d",1:16)), paste0(c("F"),sprintf("%02d",1:11)))
>>>
>>> table(Obal$Subject)[Subjs]
>> M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 F01 F02 F03 F04
>>   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4
>> F05 F06 F07 F08 F09 F10 F11
>>   4   4   4   4   4   4   4
>>> table(Ounbal$Subject)[Subjs]
>> M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12 M13 M14 M15 M16 F01 F02 F03 F04
>>   2   3   2   2   2   4   4   2   2   2   3   2   2   3   2   2   2   2   2   2
>> F05 F06 F07 F08 F09 F10 F11
>>   2   3   2   2   2   3   3
>>> table(Ounbal2$Subject)[Subjs]
>> M01  M02  M03  M04  M05  M06 <NA>  M08 <NA> <NA>  M11  M12 <NA>  M14  M15 <NA>
>>    1    1    1    1    2    3         2              1    3         1    2
>> F01  F02  F03  F04  F05  F06  F07  F08  F09  F10  F11
>>    4    3    3    1    1    1    2    2    1    4    4
>>> #----
>>> # test with the full balanced data set
>>> # pre-calling Initialize doesn't seem to make a difference
>>> csB <- corCompSymm(value = rho, form = ~ 1 | Subject)
>>> csB<-Initialize(csB, data=Obal)
>>>
>>> print(mod<-gls(fixform, correlation=csB, data=Obal))
>> Generalized least squares fit by REML
>>   Model: fixform
>>   Data: Obal
>>   Log-restricted-likelihood: -218.7563
>>
>> Coefficients:
>> (Intercept)         age   SexFemale
>> 17.7067130   0.6601852  -2.3210227
>>
>> Correlation Structure: Compound symmetry
>> Formula: ~1 | Subject
>> Parameter estimate(s):
>>       Rho
>> 0.6144908
>> Degrees of freedom: 108 total; 105 residual
>> Residual standard error: 2.305696
>>> # compare with direct call
>>> print(mod2<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Obal) )
>> Generalized least squares fit by REML
>>   Model: fixform
>>   Data: Obal
>>   Log-restricted-likelihood: -218.7563
>>
>> Coefficients:
>> (Intercept)         age   SexFemale
>> 17.7067130   0.6601852  -2.3210227
>>
>> Correlation Structure: Compound symmetry
>> Formula: ~1 | Subject
>> Parameter estimate(s):
>>       Rho
>> 0.6144908
>> Degrees of freedom: 108 total; 105 residual
>> Residual standard error: 2.305696
>>> #--------------
>>> # compare Unbalanced:
>>> csU <- corCompSymm(value = rho, form = ~ 1 | Subject)
>>> csU<-Initialize(csU, data=Ounbal)
>>>
>>> print(mod3<-gls(fixform, correlation=csU, data=Ounbal) )
>> Generalized least squares fit by REML
>>   Model: fixform
>>   Data: Ounbal
>>   Log-restricted-likelihood: -130.7703
>>
>> Coefficients:
>> (Intercept)         age   SexFemale
>>   18.046228    0.659606   -3.262389
>>
>> Correlation Structure: Compound symmetry
>> Formula: ~1 | Subject
>> Parameter estimate(s):
>>       Rho
>> 0.7211814
>> Degrees of freedom: 64 total; 61 residual
>> Residual standard error: 2.378517
>>> # compare with direct call
>>> print(mod4<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Ounbal) )
>> Generalized least squares fit by REML
>>   Model: fixform
>>   Data: Ounbal
>>   Log-restricted-likelihood: -127.616
>>
>> Coefficients:
>> (Intercept)         age   SexFemale
>>   18.330637    0.634451   -2.947638
>>
>> Correlation Structure: Compound symmetry
>> Formula: ~1 | Subject
>> Parameter estimate(s):
>>       Rho
>> 0.7598463
>> Degrees of freedom: 64 total; 61 residual
>> Residual standard error: 2.34705
>>> #----
>>> # compare with Unbalanced 2 with singletons
>>> csU2 <- corCompSymm(value = rho, form = ~ 1 | Subject)
>>> csU2<-Initialize(csU2, data=Ounbal2)
>>>
>>> print(mod5<-gls(fixform, correlation=csU2, data=Ounbal2))
>> Generalized least squares fit by REML
>>   Model: fixform
>>   Data: Ounbal2
>>   Log-restricted-likelihood: -74.02154
>>
>> Coefficients:
>> (Intercept)         age   SexFemale
>> 18.5018545   0.5265219  -1.4587968
>>
>> Correlation Structure: Compound symmetry
>> Formula: ~1 | Subject
>> Parameter estimate(s):
>>       Rho
>> 0.8736732
>> Degrees of freedom: 44 total; 41 residual
>> Residual standard error: 1.94714
>>> # compare with direct call
>>> print(mod6<-gls(fixform, correlation=corCompSymm(value = rho, form = ~ 1 | Subject), data=Ounbal2))
>> Generalized least squares fit by REML
>>   Model: fixform
>>   Data: Ounbal2
>>   Log-restricted-likelihood: -77.29477
>>
>> Coefficients:
>> (Intercept)         age   SexFemale
>> 18.4979634   0.5550817  -1.6893916
>>
>> Correlation Structure: Compound symmetry
>> Formula: ~1 | Subject
>> Parameter estimate(s):
>>       Rho
>> 0.8516777
>> Degrees of freedom: 44 total; 41 residual
>> Residual standard error: 2.024012

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list