[R-sig-ME] Singularity and the 3 Level Hierarchy

Chris Howden chr|@topher@howden @end|ng |rom @ydney@edu@@u
Mon Feb 24 23:42:38 CET 2025


Hi Kim,

Re the 2 different models random effects - Model1) Cell:Subject vs model 2) 1|Cell

Seems to me that what this is telling us is that there is no evidence of differences between subjects, but there is evidence of differences between cells. So to include that info in the model there are at least 2 ways one might go about it - based on how the cell info is coded up in the data:
1) If the data is coded up so each cell has a different indicator value e.g. subject 1 has cell 1, cell 2 and cell 3, while subject 2 has cell 4 and cell 5, etc. Then you can assign a different random intercept to each cell by including "1|Cell" in the model call.
2) However, if the data is coded up as the cell # for each subject e.g. subject 1 has cell 1, cell 2 and cell 3, while subject 2 has cell 1 and cell 2, etc. Then you need to include this in the model as Cell:Subject to get a different random intercept for each cell. Because if you included the base "Cell" variable it would fit a single random intercept that represents everyone's cell 1, and then a single random intercept that covers everyone's cell 2, etc

So....If I have got that right.....?? I think both of the above methods give the same model i.e. different random intercept for each cell? (which is valid) But how you specify it depends on how cell is coded up in the data. 

As to whether yr Model2<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file12) is valid. Well that depends on how the cell variable is coded up. I suspect it is likely coded up as Point 2) above - meaning it isn't.

Chris Howden | Statistical Lead
Statistical Consulting Unit | Sydney Informatics Hub | Core Research Facilities
+61 410 689 945 | christopher.howden using sydney.edu.au
Statistical Resources and Workflows/Workshops

Acknowledgement of SIH workflows and statistical consulting ensures the continuation of these free services. Please consider the suggested wording on our Acknowledgements page to acknowledge our contributions in any arising publications.

CRICOS 00026A
This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited. If you receive this email in error, please delete it and any attachments Please think of our environment and only print this email if necessary

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> On Behalf Of Kim Pearce via R-sig-mixed-models
Sent: Tuesday, 25 February 2025 5:15 AM
To: kalman.toth <kalman.toth using protonmail.com>
Cc: r-sig-mixed-models using r-project.org
Subject: Re: [R-sig-ME] Singularity and the 3 Level Hierarchy

Dear Kalman,

Thank you so much for your reply to the question I sent to the list last week.

Just a follow up to some of your points…


~I don’t see any reason to use REML = FALSE in your case. Given what you described, I’d definitely stick with REML = TRUE—it’s the better choice when you want to properly account for the random effect structure.
~anova() or AIC() can be used to compare models with different random effect structures.



I have read that REML is said to provide more accurate estimates of random variances whereas ML is said to produce more accurate estimates of fixed regression parameters.



Additionally, I have seen the following stipulated:



1.       REML to compare models with nested random effects and the same fixed effect structure

2.       ML to compare models with nested fixed effects and the same random effect structure

3.       ML to compare models with and without random effects

For each of 1,2 and 3 above I assume that “model comparison” can be done via the comparison of, for example, AIC values where “smaller is better”. However, Andy Field in his text “Discovering Statistics Using IBM SPSS Statistics” (2013, Edition 4: page 826 & page 835) only uses ML when comparing two linear mixed models via the likelihood ratio test (he states that the likelihood ratio test “works only if full maximum likelihood estimation is used (and not REML)”.

In fact, in the past, when I have used the anova() function in R to compare linear mixed models which have been fitted using REML=TRUE, R stipulates in the output that the models are “refitted with ML (instead of REML)” before comparison takes place.



Additionally, Field stipulates that the likelihood ratio test requires that the “new model must contain  all of the effects of the older model” i.e. that the models are nested.



~A worked example would help.

~That singular fit warning probably means that one of your parameters is estimated at (or very close to) zero. In most cases, it’s fine to just drop that parameter—unless you have a strong scientific reason to keep it.



My example was hypothetical but , say, we had j subjects and k cells in total for these j subjects and we ran the model:



Model1<-lmer(Y~Groupf+(1|Subject/Cell),REML=FALSE,data=file1)



which gave the following:



Random effects



Groups         Name                    Variance

Cell:Subject  (Intercept)              2.049e-01

Subject       (Intercept)              2.978e-09



We can definitely see singularity here (i.e. the estimated random intercept variance is virtually zero  for the j intercepts at the Subject level and approaching zero for the k intercepts at the Subject x Cell level).



In the above, it looks as if we could try dropping the random intercepts at the Subject level in order to get a non singular fit i.e. fit:


Model2<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file1)

I was just concerned that this 2 level model implies that the “top level” of the structure is Cell when, in truth, the “top level” is Subject (as cells are nested within subjects in my hypothetical study).  Your message implies that Model2 above would, in fact, still be valid.

I would be interested to hear if you (or anyone else) have any further views.

Kindest regards,
Kim







-----Original Message-----
From: kalman.toth <kalman.toth using protonmail.com>
Sent: 22 February 2025 09:19
To: Kim Pearce <kim.pearce using newcastle.ac.uk>
Cc: r-sig-mixed-models using r-project.org
Subject: Re: [R-sig-ME] Singularity and the 3 Level Hierarchy



⚠ External sender. Take care when opening links or attachments. Do not provide your login details.



Hi Kim,



I am just a scientist who often works with repeated measures data and not a statistician but I would like to add a few comments and others might amend those or add more.



1) I don’t see any reason to use REML = FALSE in your case. Given what you described, I’d definitely stick with REML = TRUE—it’s the better choice when you want to properly account for the random effect structure.

2) A work example would help.

3) What model you use should be primarily based on your experimental design and your scientific knowledge of the field.

4) That singular fit warning probably means that one of your parameters is estimated at (or very close to) zero. In most cases, it’s fine to just drop that parameter—unless you have a strong scientific reason to keep it. Also, if your experimental design is fully nested, you might want to try adding the interaction term ('Subject:Cell') instead of just 'Cell'.

5) anova() or AIC() can be used to compare models with different random effect structures.



Best Regards,

Kalman Toth





On Thursday, February 20th, 2025 at 1:14 PM, Kim Pearce via R-sig-mixed-models <r-sig-mixed-models using r-project.org<mailto:r-sig-mixed-models using r-project.org>> wrote:



> Hello everyone,

>

>

>

> If I may, I would like to ask about the appropriateness of a particular model when the original design takes on a 3 level hierarchical structure.

>

>

>

> Say we have N subjects and each of these subjects has several cells. Within each cell there are mitochondria and a continuous measurement (Y) is recorded for each mitochondrion. In this design, mitochondria (level 1) are nested within cells (level 2) and cells are nested within subjects (level 3).

>

>

>

> For ease, say we are interested in establishing if patient disease group is related to Y. There are 3 patient disease groups recorded in the factor variable "Groupf". In addition, variable "Subject" identifies each of the N subjects and variable "Cell" identifies the cells (within the subjects).

>

>

>

> Consider the syntax:

>

>

>

> Model1<-lmer(Y~Groupf+(1|Subject/Cell),REML=FALSE,data=file1)

>

>

>

> The above would give us a fixed effect for Groupf as well as intercepts for the N subjects and intercepts for the Subject x Cell combinations.

>

>

>

> Hypothetically, say we found evidence of singularity (i.e. the estimated random intercept variance was near zero at the Subject and Subject x Cell levels), however, singularity was not flagged for a 2 level hierarchy (where mitochondria are nested within cells). Would it be valid to report such a 2 level model (i.e. where the "top level" was Cell) ?

>

>

>

> Model1<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file1)

>

>

>

> or would it be more preferable to always consider Subject as the "top level" if singularity was not flagged in such a model i.e.

>

>

>

> Model1<-lmer(Y~Groupf+(1|Subject),REML=FALSE,data=file1)

>

>

>

> Many thanks for your appreciated views on this issue in advance.

>

> Kind regards,

>

> Kim

>

>

>

> Dr Kim Pearce PhD, CStat, Fellow HEA

> Senior Statistician

> Faculty of Medical Sciences Graduate School Room 3.14 3rd Floor Ridley

> Building 1 Newcastle University Queen Victoria Road Newcastle Upon

> Tyne

> NE1 7RU

>

> Tel: (0044) (0)191 208 8142

>

>

>

>

>

>

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.o
> rg> mailing list

> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat

> .ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=05%7C02%7Ckim.

> pearce%40newcastle.ac.uk%7Cb1d7b9423e284768ff1408dd5321f615%7C9c5012c9

> b61644c2a91766814fbe3e87%7C1%7C0%7C638758127525031103%7CUnknown%7CTWFp

> bGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIk

> FOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=duLSYpH3sZlu6GNE%2BR

> KtDtvOpTB01VSz6s7wfY0ZYRg%3D&reserved=0

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


More information about the R-sig-mixed-models mailing list