[R-sig-ME] R-sig-mixed-models Digest, Vol 71, Issue 34
Jarrod Hadfield
j.hadfield at ed.ac.uk
Tue Nov 27 11:07:59 CET 2012
HI Larry,
Its not clear to me exactly what you want the model to look like, but
I'm fairly certain the model you have specified in MCMCglmm (or the
lmer equivalent does) is not going to make much sense.
It is the effects that are random not the predictors. If you have a
continuous predictor you have a single effect, and so estimating the
variance of the effect is a non-starter. You should treat the effect
as fixed. MCMCglmm (and lmer) I believe will convert your continuous
variable into a factor if you specify it as ~ADHD_symptom_scores+...
and then you will end up with as many effects as there are unique
values. Estimating the variance of these effects is possible, but is
it meaningful? Probably not.
Categorical predictors with 2 levels (e.g. scanner_binary) suffer the
same problem - the precision on the variance component will be so poor
(and presumably the replication for each level so high) that you might
as well treat the effects as fixed.
Alan seems to think that you want to model the fact that the variance
in the response is non-constant between groups (or as a function of a
covariate). Is this true? If it is then
rcov=~idh(scanner_binary):units will allow the residual variance to
vary between the two groups.
random=~idh(sqrt(ADHD_symptom_scores)):units allows the residual
variance to change according to v1+ADHD_symptom_scores*v2 where v1 is
the units variance and v2 is the variance associated with the random
term.
Cheers,
Jarrod
Quoting Alan Haynes <aghaynes at gmail.com> on Tue, 27 Nov 2012 10:01:07 +0100:
> Hi Larry,
>
> I dont know how MCMCglmm handles the REs in terms of random slopes and/or
> intercepts so others will be able to provide better advice with regards
> your continuous variables.
>
> Does your scanner actually affect the variance? You could visually check
> this and if not add it as a main effect...this is quite often recommended
> for factors with only a couple of levels. The same would go for age and
> gender.
>
> You might find http://glmm.wikidot.com/faq useful. This has sections on
> whether or not to use something as a fixed or RE, assessing REs and loads
> of other stuff relating to [G]LMMs.
>
> HTH
>
> Alan
>
>
>
> --------------------------------------------------
> Email: aghaynes at gmail.com
> Mobile: +41794385586
> Skype: aghaynes
>
>
> On 26 November 2012 17:07, Laurence O'Dwyer <larodwyer at gmail.com> wrote:
>
>> Hi Alan and Alain,
>> I see that you might want to have >10 subjects
>> within
>> each level of a factor that is constructed by making one level for
>> each combination
>> of the random effects.
>> However, I have 3 REs that are continuous variables:
>> age, ADHD_symptom_scores, TBV[Total_Brain_Volume].
>> So, in that situation, I'm not sure how to calculate what you suggest
>> unless I transform them into categorical variables, e.g.:
>> dat$age_5_levels = cut(dat$age, 5)
>> In that situation I would most likely have <10 subjects in some
>> of the levels of a factor that is constructed by making levels from
>> each combination of the REs.
>>
>> Age and gender, are not significantly different between experimental groups
>> in this dataset, so I experimented with excluding them from the REs.
>> I very much wanted to control for ADHD symptoms in the REs, as this
>> was the whole idea of the model; i.e. to try to isolate autistic traits
>> within ADHD subjects, and determine to what extent these traits
>> are influenced by particular anatomical structures when the ADHD
>> symptom levels are controlled for in the REs.
>>
>> So, trying to eliminate unnecessary REs (age and gender), the model is:
>>
>> another.moo = MCMCglmm(autism_spectrum_scores ~ Diagnosis + Striatum +
>> Amygdala + Hippocampus,
>> random = ~ADHD_symptom_scores
>> + scanner_binary
>> + TBV,
>> data=dat)
>>
>> I would like to keep total brain volume (TBV: a continuous variable),
>> and scanner type (scanner_binary: 0 or 1, for two different scanner
>> types) as REs.
>> These variables are significantly different between experimental groups.
>>
>> For the above model, again, I'd be grateful for any pointers regarding how
>> to
>> assess or improve it's validity.
>> Apologies for the necessary hand holding.
>>
>> Thanks; Larry
>>
>>
>> On Mon, Nov 26, 2012 at 1:47 PM, Alan Haynes <aghaynes at gmail.com> wrote:
>> > I think that part of what Alain was getting at was that random effects
>> > require quite a few levels to calculate the variance, so RE such as age
>> are
>> > generally not recommended - you get a bad estimate of the variance from 2
>> > points.
>> >
>> > I think he's also suggesting that with only 170 data points, the
>> likelihood
>> > of overfitting is quite high when you have so many variables in your
>> models.
>> > If you made a factor with one level for each combination of your random
>> > effects, how many datapoints would fall into each category? Ive heard it
>> > suggested that fewer than 10 and you'll probably start running into
>> > difficulty...dont remember where from though I'm afraid...
>> >
>> > HTH
>> >
>> > Alan
>> >
>> > --------------------------------------------------
>> > Email: aghaynes at gmail.com
>> > Mobile: +41794385586
>> > Skype: aghaynes
>> >
>> >
>> > On 26 November 2012 12:58, Laurence O'Dwyer <larodwyer at gmail.com> wrote:
>> >>
>> >> Hi Alain,
>> >> Thank you for the reply. As a follow-up, I have a
>> >> question or two relating to your pointers and suggestions.
>> >>
>> >> >>
>> >> >> Hello to mixed-effects model experts,
>> >> >>
>> >> >> I am
>> currently
>> >> >> trying to run an analysis on structural MRI data and would like to
>> use
>> >> >> glmer or MCMCglmm to model my data. I have basic statistical
>> knowledge
>> >> >> and
>> >> >> would appreciate any guidance in the use of these R-tools from
>> experts
>> >> >> in
>> >> >> mixed effects models.
>> >> >>
>> >> >> In a crude way, I am interested in a model that
>> might
>> >> >> look
>> >> >> something like the following:
>> >> >>
>> >> >>
>> >> >>
>> >> >> moo = MCMCglmm(autism_spectrum_scores ~ Diagnosis + Striatum +
>> Thalamus
>> >> >> +
>> >> >> Amygdala + Hippocampus,
>> >> >>
>> >> >> random =
>> >> >> ~ADHD_symptom_scores
>> >> >>
>> >> >> + age
>> >> >>
>> >> >> + scanner_type
>> >> >>
>> >> >> + gender
>> >> >>
>> >> >> +
>> >> >> total_brain_volume,
>> >> >>
>> >> >> data=dat)
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > Are you using gender as random effect?
>> >> >
>> >>
>> >> Yes, I am using gender as a random variable, as there are differences
>> >> (although non-significant) in the ratio of Males:Females in each
>> >> experimental group.
>> >>
>> >> >> It is a study of ADHD and autism. I have data for ~170 children with
>> >> >> ADHD,
>> >> >> ~70 unaffected siblings, and ~80 controls - this is the fixed factor
>> >> >> "Diagnosis".
>> >> >>
>> >> >> I have the volumes of particular structures in the brain. These are
>> the
>> >> >> fixed factors Striatum, Thalamus, etc. I am interested to know their
>> >> >> relationship with a scale of autistic traits (NOT ADHD traits) within
>> >> >> all
>> >> >> experimental groups. For example, smaller volumes in the Striatum may
>> >> >> be
>> >> >> associated with increased autistic traits.
>> >> >>
>> >> >> For the random effects, I want to control for differences in ADHD
>> >> >> symptoms,
>> >> >> age, scanner type (two different scanners were used to collect the
>> >> >> volumetric data), gender and total brain volume.
>> >> >
>> >> > Yes you do. That is not a good idea. You may want to read a little bit
>> >> > on mixed modelling before doing this. Your model is overly complicated
>> >> > for 170 observations. I actually wonder whether this is mixed effects
>> >> > modelling; do you have multiple observations per child? If not...then
>> it
>> >> > seems ordinary linear regression?
>> >>
>> >>
>> >> Sorry, I am not clear here on what you are referring to as "not a good
>> >> idea"?
>> >> Each child underwent scanning. There are multiple observations (MRI
>> >> volumetrics,
>> >> as well as symptom counts from diagnostic questionnaires), for all
>> >> children.
>> >> I felt mixed-effects models would be most effective and robust in this
>> >> situation
>> >> as I am interested to know how the fixed effects influence the
>> >> response variable,
>> >> while controlling for a range of random effects that influence the
>> >> variance of the response.
>> >> This analysis also ties in with earlier work assessing the relationship
>> >> between
>> >> autism scores and total brain white matter volume and total brain grey
>> >> matter volume,
>> >> for which mixed-effects models were quite informative. MCMC was used,
>> >> as the explanatory
>> >> variables are not normally distributed.
>> >>
>> >>
>> >> As suggested, I simplified the model and looked at the VIFs. I now
>> >> have 3 Fixed Effects with GVIFs:
>> >> Striatum 1.507092
>> >> Amygdala 1.281519
>> >> Hippocampus 1.557735
>> >>
>> >>
>> >> So, I would like to know if the resulting model could be considered
>> >> statistically sound, or if there are still gaping
>> >> holes in its statistical credibility:
>> >>
>> >> try.5 = MCMCglmm(ASD_spectrum_VISK ~ Diagnosis_Simple + Striatum +
>> >> Amygdala + Hippocampus,
>> >> random =
>> >> ~Combined_Symptoms_inatt_plus_hyper
>> >> + age
>> >> + scanner_binary
>> >> + Gender
>> >> + TBV,
>> >> data=dat)
>> >>
>> >>
>> >> This leads to a result that can be reasonably well interpreted
>> >> biologically and which is in line with
>> >> the study hypothesis: ADHD diagnosis has significant effect on the
>> >> autism score, and Striatal volume (p=0.078)
>> >> has a borderline significant effect on autism score.
>> >>
>> >> I am particularly keen to know if my attempts to control for ADHD
>> >> symptoms, as well as age, scanner type, etc., are adequately
>> >> dealt with in the random effects section, or whether or not I need to
>> >> look into the specification of a prior which is noted in some of the
>> >> MCMCglmm documentation.
>> >>
>> >> Any advice is greatly appreciated.
>> >>
>> >> With thanks; Larry
>> >>
>> >>
>> >> >> A key point of the analysis would be to establish the relationship
>> >> >> between
>> >> >> structural volumes and autistic scores, when levels of ADHD have been
>> >> >> controlled for.
>> >> >>
>> >> >> One problem is that all the structural volumes are closely
>> correlated.
>> >> >> Previously, when working with two structural volumes that were
>> >> >> correlated,
>> >> >> I used the regression residuals of one structural volume relative to
>> >> >> the
>> >> >> other to isolate the unique contribution of each explanatory
>> variable,
>> >> >> independent from what was shared between them. But, I don't think I
>> can
>> >> >> use
>> >> >> this approach with four structures that are highly correlated.
>> >> >>
>> >> >> There are probably many other statistical flies in
>> the
>> >> >> ointment relating to the above. If anyone has any pointers as to how
>> to
>> >> >> deal with the situation when multiple explanatory variables are
>> >> >> correlated,
>> >> >
>> >> > dump some of them...after making scatterplots, and calculate VIF
>> values.
>> >> > Or use them, and accept that SEs will be blown up.
>> >> >
>> >> > Kind regards,
>> >> >
>> >> > Alain
>> >> >> in a mixed-effects models framework, they would be appreciated.
>> >> >>
>> >> >> Thanks; Larry
>> >> >>
>> >>
>> >> _______________________________________________
>> >> R-sig-mixed-models at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>> >
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list