[R-sig-ME] Singularity and the 3 Level Hierarchy
Kim Pearce
k|m@pe@rce @end|ng |rom newc@@t|e@@c@uk
Mon Feb 24 19:15:19 CET 2025
Dear Kalman,
Thank you so much for your reply to the question I sent to the list last week.
Just a follow up to some of your points…
~I don’t see any reason to use REML = FALSE in your case. Given what you described, I’d definitely stick with REML = TRUE—it’s the better choice when you want to properly account for the random effect structure.
~anova() or AIC() can be used to compare models with different random effect structures.
I have read that REML is said to provide more accurate estimates of random variances whereas ML is said to produce more accurate estimates of fixed regression parameters.
Additionally, I have seen the following stipulated:
1. REML to compare models with nested random effects and the same fixed effect structure
2. ML to compare models with nested fixed effects and the same random effect structure
3. ML to compare models with and without random effects
For each of 1,2 and 3 above I assume that “model comparison” can be done via the comparison of, for example, AIC values where “smaller is better”. However, Andy Field in his text “Discovering Statistics Using IBM SPSS Statistics” (2013, Edition 4: page 826 & page 835) only uses ML when comparing two linear mixed models via the likelihood ratio test (he states that the likelihood ratio test “works only if full maximum likelihood estimation is used (and not REML)”.
In fact, in the past, when I have used the anova() function in R to compare linear mixed models which have been fitted using REML=TRUE, R stipulates in the output that the models are “refitted with ML (instead of REML)” before comparison takes place.
Additionally, Field stipulates that the likelihood ratio test requires that the “new model must contain all of the effects of the older model” i.e. that the models are nested.
~A worked example would help.
~That singular fit warning probably means that one of your parameters is estimated at (or very close to) zero. In most cases, it’s fine to just drop that parameter—unless you have a strong scientific reason to keep it.
My example was hypothetical but , say, we had j subjects and k cells in total for these j subjects and we ran the model:
Model1<-lmer(Y~Groupf+(1|Subject/Cell),REML=FALSE,data=file1)
which gave the following:
Random effects
Groups Name Variance
Cell:Subject (Intercept) 2.049e-01
Subject (Intercept) 2.978e-09
We can definitely see singularity here (i.e. the estimated random intercept variance is virtually zero for the j intercepts at the Subject level and approaching zero for the k intercepts at the Subject x Cell level).
In the above, it looks as if we could try dropping the random intercepts at the Subject level in order to get a non singular fit i.e. fit:
Model2<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file1)
I was just concerned that this 2 level model implies that the “top level” of the structure is Cell when, in truth, the “top level” is Subject (as cells are nested within subjects in my hypothetical study). Your message implies that Model2 above would, in fact, still be valid.
I would be interested to hear if you (or anyone else) have any further views.
Kindest regards,
Kim
-----Original Message-----
From: kalman.toth <kalman.toth using protonmail.com>
Sent: 22 February 2025 09:19
To: Kim Pearce <kim.pearce using newcastle.ac.uk>
Cc: r-sig-mixed-models using r-project.org
Subject: Re: [R-sig-ME] Singularity and the 3 Level Hierarchy
⚠ External sender. Take care when opening links or attachments. Do not provide your login details.
Hi Kim,
I am just a scientist who often works with repeated measures data and not a statistician but I would like to add a few comments and others might amend those or add more.
1) I don’t see any reason to use REML = FALSE in your case. Given what you described, I’d definitely stick with REML = TRUE—it’s the better choice when you want to properly account for the random effect structure.
2) A work example would help.
3) What model you use should be primarily based on your experimental design and your scientific knowledge of the field.
4) That singular fit warning probably means that one of your parameters is estimated at (or very close to) zero. In most cases, it’s fine to just drop that parameter—unless you have a strong scientific reason to keep it. Also, if your experimental design is fully nested, you might want to try adding the interaction term ('Subject:Cell') instead of just 'Cell'.
5) anova() or AIC() can be used to compare models with different random effect structures.
Best Regards,
Kalman Toth
On Thursday, February 20th, 2025 at 1:14 PM, Kim Pearce via R-sig-mixed-models <r-sig-mixed-models using r-project.org<mailto:r-sig-mixed-models using r-project.org>> wrote:
> Hello everyone,
>
>
>
> If I may, I would like to ask about the appropriateness of a particular model when the original design takes on a 3 level hierarchical structure.
>
>
>
> Say we have N subjects and each of these subjects has several cells. Within each cell there are mitochondria and a continuous measurement (Y) is recorded for each mitochondrion. In this design, mitochondria (level 1) are nested within cells (level 2) and cells are nested within subjects (level 3).
>
>
>
> For ease, say we are interested in establishing if patient disease group is related to Y. There are 3 patient disease groups recorded in the factor variable "Groupf". In addition, variable "Subject" identifies each of the N subjects and variable "Cell" identifies the cells (within the subjects).
>
>
>
> Consider the syntax:
>
>
>
> Model1<-lmer(Y~Groupf+(1|Subject/Cell),REML=FALSE,data=file1)
>
>
>
> The above would give us a fixed effect for Groupf as well as intercepts for the N subjects and intercepts for the Subject x Cell combinations.
>
>
>
> Hypothetically, say we found evidence of singularity (i.e. the estimated random intercept variance was near zero at the Subject and Subject x Cell levels), however, singularity was not flagged for a 2 level hierarchy (where mitochondria are nested within cells). Would it be valid to report such a 2 level model (i.e. where the "top level" was Cell) ?
>
>
>
> Model1<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file1)
>
>
>
> or would it be more preferable to always consider Subject as the "top level" if singularity was not flagged in such a model i.e.
>
>
>
> Model1<-lmer(Y~Groupf+(1|Subject),REML=FALSE,data=file1)
>
>
>
> Many thanks for your appreciated views on this issue in advance.
>
> Kind regards,
>
> Kim
>
>
>
> Dr Kim Pearce PhD, CStat, Fellow HEA
> Senior Statistician
> Faculty of Medical Sciences Graduate School Room 3.14 3rd Floor Ridley
> Building 1 Newcastle University Queen Victoria Road Newcastle Upon
> Tyne
> NE1 7RU
>
> Tel: (0044) (0)191 208 8142
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=05%7C02%7Ckim.
> pearce%40newcastle.ac.uk%7Cb1d7b9423e284768ff1408dd5321f615%7C9c5012c9
> b61644c2a91766814fbe3e87%7C1%7C0%7C638758127525031103%7CUnknown%7CTWFp
> bGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIk
> FOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=duLSYpH3sZlu6GNE%2BR
> KtDtvOpTB01VSz6s7wfY0ZYRg%3D&reserved=0
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list