[R-sig-ME] mixed effects models and multiple explanatory variables that are correlated

Wed Nov 28 00:10:39 CET 2012

Hi Jarrod,
              Thanks for the notes.
Relating to what I want the model to look like; I am interested in modelling
autism scores (response variable) as a function of anatomical
structures (Striatum, Amygdala,
etc.), and ADHD diagnosis (with three levels for the Diagnosis factor:
con, unaffected sibling,
ADHD).

Autism scores are elevated in the ADHD group, and with the aid of an
MCMCglmm model,
I would like to identify the anatomical structures that may be
important in influencing
raised autism scores. In order to focus specifically on the autism
phenotype, I would
also like to control for the levels of ADHD within each subject
(ADHD_symptom_scores).

In light of the earlier suggestions, I now use Age and Gender as fixed effects.
Relating to Alan's comment about the "scanner_binary" - I was worried about
the variance in anatomical volumes that might be attributable to the
use of two different
scanners. Actually, it is negligible for structural volumes - but I
have data from another
imaging modality (diffusion tensor imaging) where there are
troublesome differences
in the diffusion values depending on the scanner type; for example,
the mean diffusion
of all controls scanned on scanner_1 is significantly higher than the
mean diffusion
of all controls scanned on scanner_2. For future models, I wanted to
know how best
to deal with these scanner differences.

So, trying to adapt the model, I am now looking at:

i.moo = MCMCglmm(Autism_score ~ Diagnosis + Striatum + Amygdala + Hippocampus

   + age + Gender,

random=~idh(sqrt(ADHD_symptom_score)):units +
                                          rcov=~idh(scanner_binary):units,
                                          data=dat)

As stated previously, the random effects term allows residual variance
to change according to
v1+ADHD_symptom_scores*v2  where v1 is the units variance and v2 is
the variance
associated with the random term.

This model produces a significant main effect of Striatum. However,
I'm not clear if the
random effect term means that differences in ADHD_symptom_scores
between subjects
are controlled for. If they are not controlled for, then a negative
correlation between the
volume of the Striatum and Autism_score may also be explained by the
high ADHD_score
in subjects that have a high Autism_score. I am particularly
interested in isolating
the structural correlates that are unique to Autism_score and was
hoping that an
appropriate random effects term would allow me to get at that point.

Any pointers are much appreciated as is your patience in dealing with
the beginner
nature of the questions.

Cheers; Larry

On Tue, Nov 27, 2012 at 11:07 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
> HI Larry,
>
> Its not clear to me exactly what you want the model to look like, but I'm
> fairly certain the model you have specified in MCMCglmm (or the lmer
> equivalent does) is not going to make much sense.
>
> It is the effects that are random not the predictors. If you have a
> continuous predictor you have a single effect, and so estimating the
> variance of the effect is a non-starter. You should treat the effect as
> fixed. MCMCglmm (and lmer) I believe will convert your continuous variable
> into a factor if you specify it as ~ADHD_symptom_scores+... and then you
> will end up with as many effects as there are unique values. Estimating the
> variance of these effects is possible, but is it meaningful? Probably not.
>
> Categorical predictors with 2 levels (e.g. scanner_binary) suffer the same
> problem - the precision on the variance component will be so poor (and
> presumably the replication for each level so high) that you might as well
> treat the effects as fixed.
>
> Alan seems to think that you want to model the fact that the variance in the
> response is non-constant between groups (or as a function of a covariate).
> Is this true? If it is then rcov=~idh(scanner_binary):units will allow the
> residual variance to vary between the two groups.
> random=~idh(sqrt(ADHD_symptom_scores)):units  allows the residual variance
> to change according to v1+ADHD_symptom_scores*v2  where v1 is the units
> variance and v2 is the variance associated with the random term.
>
> Cheers,
>
> Jarrod
>
>
>
>
> Quoting Alan Haynes <aghaynes at gmail.com> on Tue, 27 Nov 2012 10:01:07 +0100:
>
>> Hi Larry,
>>
>> I dont know how MCMCglmm handles the REs in terms of random slopes and/or
>> intercepts so others will be able to provide better advice with regards
>> your continuous variables.
>>
>> Does your scanner actually affect the variance? You could visually check
>> this and if not add it as a main effect...this is quite often recommended
>> for factors with only a couple of levels. The same would go for age and
>> gender.
>>
>> You might find http://glmm.wikidot.com/faq useful. This has sections on
>> whether or not to use something as a fixed or RE, assessing REs and loads
>> of other stuff relating to [G]LMMs.
>>
>> HTH
>>
>> Alan
>>
>>
>>
>> --------------------------------------------------
>> Email: aghaynes at gmail.com
>> Mobile: +41794385586
>> Skype: aghaynes
>>
>>
>> On 26 November 2012 17:07, Laurence O'Dwyer <larodwyer at gmail.com> wrote:
>>
>>> Hi Alan and Alain,
>>>                           I see that you might want to have >10 subjects
>>> within
>>> each level of a factor that is constructed by making one level for
>>> each combination
>>> of the random effects.
>>> However, I have 3 REs that are continuous variables:
>>> age, ADHD_symptom_scores, TBV[Total_Brain_Volume].
>>> So, in that situation, I'm not sure how to calculate what you suggest
>>> unless I transform them into categorical variables, e.g.:
>>> dat$age_5_levels = cut(dat$age, 5)
>>> In that situation I would most likely have <10 subjects in some
>>> of the levels of a factor that is constructed by making levels from
>>> each combination of the REs.
>>>
>>> Age and gender, are not significantly different between experimental
>>> groups
>>> in this dataset, so I experimented with excluding them from the REs.
>>> I very much wanted to control for ADHD symptoms in the REs, as this
>>> was the whole idea of the model; i.e. to try to isolate autistic traits
>>> within ADHD subjects, and determine to what extent these traits
>>> are influenced by particular anatomical structures when the ADHD
>>> symptom levels are controlled for in the REs.
>>>
>>> So, trying to eliminate unnecessary REs (age and gender), the model is:
>>>
>>> another.moo = MCMCglmm(autism_spectrum_scores ~ Diagnosis + Striatum +
>>> Amygdala + Hippocampus,
>>>                                           random = ~ADHD_symptom_scores
>>>                                                     + scanner_binary
>>>                                                     + TBV,
>>>                                           data=dat)
>>>
>>> I would like to keep total brain volume (TBV: a continuous variable),
>>> and scanner type (scanner_binary: 0 or 1, for two different scanner
>>> types) as REs.
>>> These variables are significantly different between experimental groups.
>>>
>>> For the above model, again, I'd be grateful for any pointers regarding
>>> how
>>> to
>>> assess or improve it's validity.
>>> Apologies for the necessary hand holding.
>>>
>>> Thanks; Larry
>>>
>>>
>>> On Mon, Nov 26, 2012 at 1:47 PM, Alan Haynes <aghaynes at gmail.com> wrote:
>>> > I think that part of what Alain was getting at was that random effects
>>> > require quite a few levels to calculate the variance, so RE such as age
>>> are
>>> > generally not recommended - you get a bad estimate of the variance from
>>> > 2
>>> > points.
>>> >
>>> > I think he's also suggesting that with only 170 data points, the
>>> likelihood
>>> > of overfitting is quite high when you have so many variables in your
>>> models.
>>> > If you made a factor with one level for each combination of your random
>>> > effects, how many datapoints would fall into each category? Ive heard
>>> > it
>>> > suggested that fewer than 10 and you'll probably start running into
>>> > difficulty...dont remember where from though I'm afraid...
>>> >
>>> > HTH
>>> >
>>> > Alan
>>> >
>>> > --------------------------------------------------
>>> > Email: aghaynes at gmail.com
>>> > Mobile: +41794385586
>>> > Skype: aghaynes
>>> >
>>> >
>>> > On 26 November 2012 12:58, Laurence O'Dwyer <larodwyer at gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi Alain,
>>> >>              Thank you for the reply. As a follow-up, I have a
>>> >> question or two relating to your pointers and suggestions.
>>> >>
>>> >> >>
>>> >> >> Hello to mixed-effects model experts,
>>> >> >>
>>> >> >>                                                         I am
>>> currently
>>> >> >> trying to run an analysis on structural MRI data and would like to
>>> use
>>> >> >> glmer or MCMCglmm to model my data. I have basic statistical
>>> knowledge
>>> >> >> and
>>> >> >> would appreciate any guidance in the use of these R-tools from
>>> experts
>>> >> >> in
>>> >> >> mixed effects models.
>>> >> >>
>>> >> >>                  In a crude way, I am interested in a model that
>>> might
>>> >> >> look
>>> >> >> something like the following:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> moo = MCMCglmm(autism_spectrum_scores ~ Diagnosis + Striatum +
>>> Thalamus
>>> >> >> +
>>> >> >> Amygdala + Hippocampus,
>>> >> >>
>>> >> >>                                            random =
>>> >> >> ~ADHD_symptom_scores
>>> >> >>
>>> >> >>                                                      + age
>>> >> >>
>>> >> >>                                                      + scanner_type
>>> >> >>
>>> >> >>                                                      + gender
>>> >> >>
>>> >> >>                                                      +
>>> >> >> total_brain_volume,
>>> >> >>
>>> >> >>                                            data=dat)
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >> > Are you using gender as random effect?
>>> >> >
>>> >>
>>> >> Yes, I am using gender as a random variable, as there are differences
>>> >> (although non-significant) in the ratio of Males:Females in each
>>> >> experimental group.
>>> >>
>>> >> >> It is a study of ADHD and autism. I have data for ~170 children
>>> >> >> with
>>> >> >> ADHD,
>>> >> >> ~70 unaffected siblings, and ~80 controls - this is the fixed
>>> >> >> factor
>>> >> >> "Diagnosis".
>>> >> >>
>>> >> >> I have the volumes of particular structures in the brain. These are
>>> the
>>> >> >> fixed factors Striatum, Thalamus, etc. I am interested to know
>>> >> >> their
>>> >> >> relationship with a scale of autistic traits (NOT ADHD traits)
>>> >> >> within
>>> >> >> all
>>> >> >> experimental groups. For example, smaller volumes in the Striatum
>>> >> >> may
>>> >> >> be
>>> >> >> associated with increased autistic traits.
>>> >> >>
>>> >> >> For the random effects, I want to control for differences in ADHD
>>> >> >> symptoms,
>>> >> >> age, scanner type (two different scanners were used to collect the
>>> >> >> volumetric data), gender and total brain volume.
>>> >> >
>>> >> > Yes you do. That is not a good idea. You may want to read a little
>>> >> > bit
>>> >> > on mixed modelling before doing this. Your model is overly
>>> >> > complicated
>>> >> > for 170 observations. I actually wonder whether this is mixed
>>> >> > effects
>>> >> > modelling; do you have multiple observations per child? If
>>> >> > not...then
>>> it
>>> >> > seems ordinary linear regression?
>>> >>
>>> >>
>>> >> Sorry, I am not clear here on what you are referring to as "not a good
>>> >> idea"?
>>> >> Each child underwent scanning. There are multiple observations (MRI
>>> >> volumetrics,
>>> >> as well as symptom counts from diagnostic questionnaires), for all
>>> >> children.
>>> >> I felt mixed-effects models would be most effective and robust in this
>>> >> situation
>>> >> as I am interested to know how the fixed effects influence the
>>> >> response variable,
>>> >> while controlling for a range of random effects that influence the
>>> >> variance of the response.
>>> >> This analysis also ties in with earlier work assessing the
>>> >> relationship
>>> >> between
>>> >> autism scores and total brain white matter volume and total brain grey
>>> >> matter volume,
>>> >> for which mixed-effects models were quite informative. MCMC was used,
>>> >> as the explanatory
>>> >> variables are not normally distributed.
>>> >>
>>> >>
>>> >> As suggested, I simplified the model and looked at the VIFs. I now
>>> >> have 3 Fixed Effects with GVIFs:
>>> >> Striatum    1.507092
>>> >> Amygdala    1.281519
>>> >> Hippocampus 1.557735
>>> >>
>>> >>
>>> >> So, I would like to know if the resulting model could be considered
>>> >> statistically sound, or if there are still gaping
>>> >> holes in its statistical credibility:
>>> >>
>>> >> try.5 = MCMCglmm(ASD_spectrum_VISK ~ Diagnosis_Simple + Striatum +
>>> >> Amygdala + Hippocampus,
>>> >>                                           random =
>>> >> ~Combined_Symptoms_inatt_plus_hyper
>>> >>                                                     + age
>>> >>                                                     + scanner_binary
>>> >>                                                     + Gender
>>> >>                                                     + TBV,
>>> >>                                           data=dat)
>>> >>
>>> >>
>>> >> This leads to a result that can be reasonably well interpreted
>>> >> biologically and which is in line with
>>> >> the study hypothesis: ADHD diagnosis has significant effect on the
>>> >> autism score, and Striatal volume (p=0.078)
>>> >> has a borderline significant effect on autism score.
>>> >>
>>> >> I am particularly keen to know if my attempts to control for ADHD
>>> >> symptoms, as well as age, scanner type, etc., are adequately
>>> >> dealt with in the random effects section, or whether or not I need to
>>> >> look into the specification of a prior which is noted in some of the
>>> >> MCMCglmm documentation.
>>> >>
>>> >> Any advice is greatly appreciated.
>>> >>
>>> >> With thanks; Larry
>>> >>
>>> >>
>>> >> >> A key point of the analysis would be to establish the relationship
>>> >> >> between
>>> >> >> structural volumes and autistic scores, when levels of ADHD have
>>> >> >> been
>>> >> >> controlled for.
>>> >> >>
>>> >> >> One problem is that all the structural volumes are closely
>>> correlated.
>>> >> >> Previously, when working with two structural volumes that were
>>> >> >> correlated,
>>> >> >> I used the regression residuals of one structural volume relative
>>> >> >> to
>>> >> >> the
>>> >> >> other to isolate the unique contribution of each explanatory
>>> variable,
>>> >> >> independent from what was shared between them. But, I don't think I
>>> can
>>> >> >> use
>>> >> >> this approach with four structures that are highly correlated.
>>> >> >>
>>> >> >>                  There are probably many other statistical flies in
>>> the
>>> >> >> ointment relating to the above. If anyone has any pointers as to
>>> >> >> how
>>> to
>>> >> >> deal with the situation when multiple explanatory variables are
>>> >> >> correlated,
>>> >> >
>>> >> > dump some of them...after making scatterplots, and calculate VIF
>>> values.
>>> >> > Or use them, and accept that SEs will be blown up.
>>> >> >
>>> >> > Kind regards,
>>> >> >
>>> >> > Alain
>>> >> >> in a mixed-effects models framework, they would be appreciated.
>>> >> >>
>>> >> >>                  Thanks; Larry
>>> >> >>
>>> >>
>>> >> _______________________________________________
>>> >> R-sig-mixed-models at r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >
>>> >
>>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>