[R-sig-ME] Does corSymm() require balanced data?

Mon Mar 15 18:37:18 CET 2021

Dear Joe,

At the risk of revealing something that could be misused (because I agree with Thierry that you are pushing things by trying to fit this model with these data), you can get the model to converge by switching to a different optimizer (i.e., BFGS):

fit <- lme(opp ~ time * ccog, random = ~ 1 | id, correlation = corSymm(form = ~ 1 | id), data = dat, control = list(opt = "optim"))

Whether this converges to the global maximum I have not attempted to check.

Maybe this is still useful to know because it might allow you to make a more informed decision about the use of a simpler model. For example:

fit2 <- lme(opp ~ time*ccog, random = ~ 1 | id, correlation = corAR1(form = ~ time), data = dat, control=list(opt="optim"))
anova(fit, fit2)

shows that the corSymm() model does not fit significantly better than the AR1 model.

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces using r-project.org] On
>Behalf Of Ben Bolker
>Sent: Monday, 15 March, 2021 18:05
>To: r-sig-mixed-models using r-project.org
>Subject: Re: [R-sig-ME] Does corSymm() require balanced data?
>
>On 3/15/21 10:56 AM, Tip But wrote:
>> Dear Thierry,
>>
>> Thank you so much for your insightful comments. May I follow-up on them
>> below in-line:
>>
>>
>> ***"You have too few subjects with 4 observations. Either drop those fourth
>> observations."
>>
>>>>>> Does the above mean that for an unstructured residual correlation
>> matrix, the unique number of measurements (e.g., 3 times, 4 times etc.)
>> must have relatively equal sizes (e.g., 9 subjects with 3 times, 7 subjects
>> with 4 times)?
>
>  Balance is probably less important than the total number with 4
>observations.  If you had 100 subjects with 3 times and 20 subjects with
>4 times you'd probably be fine.
>
>>
>> ***"Or use a different correlation structure. E.g. an AR1:
>>
>> fit_alt <- lme(opp ~ time * ccog, random = ~1 | id,
>>    correlation = corAR1(form = ~ time), data = dat)
>> "
>>
>>>>>> In your above R code, is it necessary to use `corAR1(form = ~ time)`?
>> It seems `corAR1(form = ~1 | id)` gives the same result?
>
>   I believe that form = ~1|id uses the order of the observations in the
>data set as the time index, and the grouping variable from the random
>effect as the grouping variable, so these should indeed be equivalent (I
>think the documentation should state this, but I haven't checked)
>
>   If you **really** want an answer you can tell R to return it anyway:
>use control=lmeControl(returnObject=TRUE), but I wouldn't trust it.
>
>   It's hard to find another mixed-model package in R that can handle
>this case (unstructured correlation, homogeneous variance).
>
>> On Mon, Mar 15, 2021 at 2:37 AM Thierry Onkelinx <thierry.onkelinx using inbo.be>
>> wrote:
>>
>>> Dear Joe,
>>>
>>> You have too few subjects with 4 observations. Either drop those fourth
>>> observations. Or use a different correlation structure. E.g. an AR1
>>>
>>> fit <- lme(
>>>    opp ~ time * ccog, random = ~1 | id,
>>>    correlation = corSymm(), data = dat, subset = time < 3
>>> )
>>>
>>> fit_alt <- lme(
>>>    opp ~ time * ccog, random = ~1 | id,
>>>    correlation = corAR1(form = ~ time), data = dat
>>> )
>>> Best regards,
>>>
>>>
>>> ir. Thierry Onkelinx
>>> Statisticus / Statistician
>>>
>>> Vlaamse Overheid / Government of Flanders
>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
>>> FOREST
>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>> thierry.onkelinx using inbo.be
>>> Havenlaan 88 bus 73, 1000 Brussel
>>> www.inbo.be
>>>
>>> Op ma 15 mrt. 2021 om 03:27 schreef Tip But <fswfswt using gmail.com>:
>>>
>>>> Dear Members,
>>>>
>>>> In my longitudinal data below, the first couple of subjects were measured
>>>> 4
>>>> times but the rest of the subjects were measured 3 times (see data below).
>>>>
>>>> We intend to use an unstructured residual correlation matrix in
>>>> `nlme::lme()`. But our model fails to converge.
>>>>
>>>> Question: Given our data is unbalanced with respect to our grouping
>>>> variable (i.e., `id`), can we use ` corSymm()`? And if we do, what would
>>>> be
>>>> the dimensions of the resultant unstructured residual correlation matrix
>>>> for our data; a 3x3 or a 4x4 matrix?
>>>>
>>>> Thank you for your expertise,
>>>> Joe
>>>>
>>>> # Data and R Code
>>>> dat <- read.csv("https://raw.githubusercontent.com/hkil/m/master/un.csv")
>>>>
>>>> library(nlme)
>>>>
>>>> fit <- lme(opp~time*ccog, random = ~1|id, correlation=corSymm(form = ~ 1 |
>>>> id),
>>>>             data=dat)
>>>>
>>>> Error:
>>>>    nlminb problem, convergence error code = 1
>>>>    message = false convergence (8)