[R-sig-ME] lmer analysis of identical twins data

Fri Dec 18 11:59:56 CET 2020

Hi all!

I have a question related to using lmer() function of lme4 package in identical twins' studies which I would appreciate if you could answer.

We have PFAS measured pollution dataset constructed of ~50 (n=100) monozygotic (identical) twins. The goal is to detect the significantly differential PFAS pollutants between the leaner individuals (L) and those individuals with obesity (F):

1. As a solution, I was planning to run lmer() to run differential PFAS levels while adjusting for the sex, age (young/old) and the sample extraction year. As a random effect I was thinking to use the family IDs (i.e. extreme similarities between the individuals that is caused by 'twinship'). Therefore, I am using the design model as 'pfasLogStandardized ~  LF + sex + youngOrOld + yearClass +  (1 | familyID)'. However, I am wondering whether this is the best approach since considering the 'twinship' as a random effect means that the sample size within each of the random effects will be 2 (since it is family IDs of the 'twins') ! It seems like due to small sample size the fitted regressions will feature high variances. I was wondering if this the best approach in your opinion. Note that twinship or family ID is not completely independent from sex and age since identical twins have also the same sex and age.

2. The alternative approach that comes to my mind is to not adjust for familyID (or twinship) but to run ANOVA or student-t test and adjust for 'pfasLogStandardized ~  LF + sex + youngOrOld + yearClass'. Here the problem is that the analysis will not be adjusting for the extreme similarities between the twins.

3. Another approach is to swap the formula in lmer and to adjust for familyID as covariate and to consider a factor with combined info of age-sex-year as random effect, e.g. s.th. like 'pfasLogStandardized ~  LF + familyID + (1 | RandFact)' , while RandFact = as.factor(paste(sex , youngOrOld , yearClass)) .

I would really appreciate your opinion on this issue and on, overall, what is the best way to run these kinds of analyses on identical twins' data while adjusting for the extreme similarities of the twins.

Cheers,

	[[alternative HTML version deleted]]