[R-sig-ME] Fwd: mixed effects models and multiple explanatory variables that are correlated

Tue Nov 27 09:00:09 CET 2012

Hi Alan and Alain,
                          I see that you might want to have >10 subjects within
each level of a factor that is constructed by making one level for
each combination of the random effects.
However, I have 3 REs that are continuous variables:
age, ADHD_symptom_scores, TBV[Total_Brain_Volume].
So, in that situation, I'm not sure how to calculate what you suggest
unless I transform them into categorical variables, e.g.:
dat$age_5_levels = cut(dat$age, 5)
In that situation I would most likely have <10 subjects in some
of the levels of a factor that is constructed by making levels from
each combination of the REs.

Age and gender, are not significantly different between experimental groups
in this dataset, so I experimented with excluding them from the REs.
I very much wanted to control for ADHD symptoms in the REs, as this
was the whole idea of the model; i.e. to try to isolate autistic traits
within ADHD subjects, and determine to what extent these traits
are influenced by particular anatomical structures when the ADHD
symptom levels are controlled for in the REs.

So, trying to eliminate unnecessary REs (age and gender), the model is:

another.moo = MCMCglmm(autism_spectrum_scores ~ Diagnosis + Striatum +
Amygdala + Hippocampus,
                                          random = ~ADHD_symptom_scores
                                                    + scanner_binary
                                                    + TBV,
                                          data=dat)

I would like to keep total brain volume (TBV: a continuous variable),
and scanner type (scanner_binary: 0 or 1, for two different scanner
types) as REs.
These variables are significantly different between experimental groups.

For the above model, again, I'd be grateful for any pointers regarding how to
assess or improve it's validity.
Apologies for the necessary hand holding.

Thanks; Larry

On Mon, Nov 26, 2012 at 1:47 PM, Alan Haynes <aghaynes at gmail.com> wrote:
> I think that part of what Alain was getting at was that random effects
> require quite a few levels to calculate the variance, so RE such as age are
> generally not recommended - you get a bad estimate of the variance from 2
> points.
>
> I think he's also suggesting that with only 170 data points, the likelihood
> of overfitting is quite high when you have so many variables in your models.
> If you made a factor with one level for each combination of your random
> effects, how many datapoints would fall into each category? Ive heard it
> suggested that fewer than 10 and you'll probably start running into
> difficulty...dont remember where from though I'm afraid...
>
> HTH
>
> Alan
>
> --------------------------------------------------
> Email: aghaynes at gmail.com
> Mobile: +41794385586
> Skype: aghaynes
>
>
> On 26 November 2012 12:58, Laurence O'Dwyer <larodwyer at gmail.com> wrote:
>>
>> Hi Alain,
>>              Thank you for the reply. As a follow-up, I have a
>> question or two relating to your pointers and suggestions.
>>
>> >>
>> >> Hello to mixed-effects model experts,
>> >>
>> >>                                                         I am currently
>> >> trying to run an analysis on structural MRI data and would like to use
>> >> glmer or MCMCglmm to model my data. I have basic statistical knowledge
>> >> and
>> >> would appreciate any guidance in the use of these R-tools from experts
>> >> in
>> >> mixed effects models.
>> >>
>> >>                  In a crude way, I am interested in a model that might
>> >> look
>> >> something like the following:
>> >>
>> >>
>> >>
>> >> moo = MCMCglmm(autism_spectrum_scores ~ Diagnosis + Striatum + Thalamus
>> >> +
>> >> Amygdala + Hippocampus,
>> >>
>> >>                                            random =
>> >> ~ADHD_symptom_scores
>> >>
>> >>                                                      + age
>> >>
>> >>                                                      + scanner_type
>> >>
>> >>                                                      + gender
>> >>
>> >>                                                      +
>> >> total_brain_volume,
>> >>
>> >>                                            data=dat)
>> >>
>> >>
>> >
>> >
>> > Are you using gender as random effect?
>> >
>>
>> Yes, I am using gender as a random variable, as there are differences
>> (although non-significant) in the ratio of Males:Females in each
>> experimental group.
>>
>> >> It is a study of ADHD and autism. I have data for ~170 children with
>> >> ADHD,
>> >> ~70 unaffected siblings, and ~80 controls - this is the fixed factor
>> >> "Diagnosis".
>> >>
>> >> I have the volumes of particular structures in the brain. These are the
>> >> fixed factors Striatum, Thalamus, etc. I am interested to know their
>> >> relationship with a scale of autistic traits (NOT ADHD traits) within
>> >> all
>> >> experimental groups. For example, smaller volumes in the Striatum may
>> >> be
>> >> associated with increased autistic traits.
>> >>
>> >> For the random effects, I want to control for differences in ADHD
>> >> symptoms,
>> >> age, scanner type (two different scanners were used to collect the
>> >> volumetric data), gender and total brain volume.
>> >
>> > Yes you do. That is not a good idea. You may want to read a little bit
>> > on mixed modelling before doing this. Your model is overly complicated
>> > for 170 observations. I actually wonder whether this is mixed effects
>> > modelling; do you have multiple observations per child? If not...then it
>> > seems ordinary linear regression?
>>
>>
>> Sorry, I am not clear here on what you are referring to as "not a good
>> idea"?
>> Each child underwent scanning. There are multiple observations (MRI
>> volumetrics,
>> as well as symptom counts from diagnostic questionnaires), for all
>> children.
>> I felt mixed-effects models would be most effective and robust in this
>> situation
>> as I am interested to know how the fixed effects influence the
>> response variable,
>> while controlling for a range of random effects that influence the
>> variance of the response.
>> This analysis also ties in with earlier work assessing the relationship
>> between
>> autism scores and total brain white matter volume and total brain grey
>> matter volume,
>> for which mixed-effects models were quite informative. MCMC was used,
>> as the explanatory
>> variables are not normally distributed.
>>
>>
>> As suggested, I simplified the model and looked at the VIFs. I now
>> have 3 Fixed Effects with GVIFs:
>> Striatum    1.507092
>> Amygdala    1.281519
>> Hippocampus 1.557735
>>
>>
>> So, I would like to know if the resulting model could be considered
>> statistically sound, or if there are still gaping
>> holes in its statistical credibility:
>>
>> try.5 = MCMCglmm(ASD_spectrum_VISK ~ Diagnosis_Simple + Striatum +
>> Amygdala + Hippocampus,
>>                                           random =
>> ~Combined_Symptoms_inatt_plus_hyper
>>                                                     + age
>>                                                     + scanner_binary
>>                                                     + Gender
>>                                                     + TBV,
>>                                           data=dat)
>>
>>
>> This leads to a result that can be reasonably well interpreted
>> biologically and which is in line with
>> the study hypothesis: ADHD diagnosis has significant effect on the
>> autism score, and Striatal volume (p=0.078)
>> has a borderline significant effect on autism score.
>>
>> I am particularly keen to know if my attempts to control for ADHD
>> symptoms, as well as age, scanner type, etc., are adequately
>> dealt with in the random effects section, or whether or not I need to
>> look into the specification of a prior which is noted in some of the
>> MCMCglmm documentation.
>>
>> Any advice is greatly appreciated.
>>
>> With thanks; Larry
>>
>>
>> >> A key point of the analysis would be to establish the relationship
>> >> between
>> >> structural volumes and autistic scores, when levels of ADHD have been
>> >> controlled for.
>> >>
>> >> One problem is that all the structural volumes are closely correlated.
>> >> Previously, when working with two structural volumes that were
>> >> correlated,
>> >> I used the regression residuals of one structural volume relative to
>> >> the
>> >> other to isolate the unique contribution of each explanatory variable,
>> >> independent from what was shared between them. But, I don't think I can
>> >> use
>> >> this approach with four structures that are highly correlated.
>> >>
>> >>                  There are probably many other statistical flies in the
>> >> ointment relating to the above. If anyone has any pointers as to how to
>> >> deal with the situation when multiple explanatory variables are
>> >> correlated,
>> >
>> > dump some of them...after making scatterplots, and calculate VIF values.
>> > Or use them, and accept that SEs will be blown up.
>> >
>> > Kind regards,
>> >
>> > Alain
>> >> in a mixed-effects models framework, they would be appreciated.
>> >>
>> >>                  Thanks; Larry
>> >>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>