[R-sig-ME] Related fixed and random factors and planned comparisons in a 2x2 design
Phillip Alday
phillip.alday at mpi.nl
Wed Aug 9 15:27:40 CEST 2017
Another old post ...
I wouldn't subset models as a rule of thumb because overarching models
tend to fall on my preferred side of the bias-variance /
underfitting-overfitting tradeoff.
For posthoc comparisons between groups, you can use techniques like
marginal / least-square means, implemented in R in packages like lsmeans.
Phillip
On 06/07/2016 10:27 AM, paul wrote:
> Dear Phillip,
>
> Many thanks for these resources and replies. They are indeed very
> helpful. I suppose after I've done the contrast coding, I still have to
> subset the data (e.g., singling out the data for P) to do planned
> comparisons between groups, using a reduced mixed model as illustrated
> earlier, then? Or are there any alternative ways to do so?
>
> Best regards,
>
> Paul
>
> 2016-06-07 2:57 GMT+02:00 Phillip Alday <Phillip.Alday at unisa.edu.au
> <mailto:Phillip.Alday at unisa.edu.au>>:
>
> In terms of contrast coding, two more helpful resources are:
>
> http://talklab.psy.gla.ac.uk/tvw/catpred/
>
> http://palday.bitbucket.org/stats/coding.html
>
> Channel makes sense as a random effect / grouping term for your
> particular design, *not* nested within participant. The implicit
> crossing given by (1|Participant) + (1|Channel) models [omitting any
> slope terms to focus on the grouping variables] (1) interindividual
> differences in the EEG and (2) differences between electrodes
> because closely located electrodes can be thought of as samples from
> a population consisting of a given Region of Interest (ROI),
> especially if the electrode placement is somewhat symmetric. The
> differences resulting from variance in electrode placement between
> participants will be covered by the implicit crossing of these two
> random effects.
>
> Note that using channel as a random effect is somewhat more
> difficult if you're doing a whole scalp analysis as sampling across
> the whole scalp can be viewed as sampling from multiple ROIs, i.e.
> multiple populations. Two possible solutions are (1) to include ROI
> in the fixed effects and keep channel in the random effects and (2)
> model channel as a two or three continuous spatial variables (e.g.
> displacement from midline or displacement from center based on 10-20
> coordinates, or spatial coordinates of the sort used in source
> localisation) in the fixed effects. In the case of (1), the channel
> random effect would then be modelling the typical variance within
> ROIs (because that's hopefully the major source of variance
> structured by channel left over after modelling ROI and your
> experimental manipulation). If this within-variance differs greatly
> between between ROIs, then this may be a sub-optimal modelling
> choice. In the case of (2), it might still make sense to
> additionally model channel as a random effect (i.e. the RE with the
> factor consisting of channel names, the FE with the continuous
> coordinates), see Thierry Onkelinx's posts on the subject and
> http://rpubs.com/INBOstats/both_fixed_random , but I haven't thought
> about this enough nor examined the resulting model fits.
>
> Best,
> Phillip
>
> -----Original Message-----
> From: R-sig-mixed-models
> [mailto:r-sig-mixed-models-bounces at r-project.org
> <mailto:r-sig-mixed-models-bounces at r-project.org>] On Behalf Of paul
> Sent: Tuesday, 7 June 2016 5:27 AM
> To: Houslay, Tom <T.Houslay at exeter.ac.uk
> <mailto:T.Houslay at exeter.ac.uk>>
> Cc: r-sig-mixed-models at r-project.org
> <mailto:r-sig-mixed-models at r-project.org>
> Subject: Re: [R-sig-ME] Related fixed and random factors and planned
> comparisons in a 2x2 design
>
> Dear Tom,
>
> Thank you so much for these detailed replies and I appreciate your help!
>
> Sincerely,
>
> Paul
>
> 2016-06-06 21:51 GMT+02:00 Houslay, Tom <T.Houslay at exeter.ac.uk
> <mailto:T.Houslay at exeter.ac.uk>>:
>
> > Hi Paul,
> >
> >
> > I think you're right here in that actually you don't want to nest
> > channel inside participant (which led to that error message - sorry,
> > should have seen that coming!).
> >
> >
> > It's hard to know without seeing data plotted, but my guess from your
> > email is that you probably see some clustering both at individual
> > level and at channel level? Perhaps separate random effects, ie
> > (1|Participant) + (1|Channel), is the way to go (and then you
> > shouldn't have the problem as regards number of observations - instead
> > you'll have an intercept deviation for each of your N individuals, and
> > also intercept deviations for each of your 9 channels). You certainly
> > want to keep the participant intercept in though, as each individual
> > gets both items (right?), so you need to model that association. You
> > can use your variance components output from lmer to determine what
> > proportion of the phenotypic variance (conditional on your fixed
> > effects) is explained by each of these components, eg
> > V(individual)/(V(individual) + V(channel) + V(residual) would give you
> > the proportion explained by differences among individuals in their
> > voltage. It would be cool to know if differences among individuals, or
> > among channels, is driving the variation that you find. I think using
> > the sjplot function for lmer would be useful to look at the levels of
> > your random
> > effects:
> >
> >
> > http://strengejacke.de/sjPlot/sjp.lmer/
> >
> >
> > As for 'contrasts', again I haven't used that particular package, but
> > from a brief glance it looks like you're on the right track - binary
> > coding is the 'simple coding' as set out here:
> >
> >
> > http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
> >
> >
> > Good luck!
> >
> >
> > Tom
> >
> >
> > ------------------------------
> > *From:* paul <graftedlife at gmail.com <mailto:graftedlife at gmail.com>>
> > *Sent:* 06 June 2016 20:06:02
> > *To:* Houslay, Tom
> > *Cc:* r-sig-mixed-models at r-project.org
> <mailto:r-sig-mixed-models at r-project.org>
> > *Subject:* Re: Related fixed and random factors and planned
> > comparisons in a 2x2 design
> >
> > Dear Tom,
> >
> > Many thanks for these very helpful comments and suggestions! Would you
> > just allow me to ask some further questions:
> >
> > 1. I've been considering whether to cross or to nest the random
> > effects for quite a while. Data from the same channel across
> > participants do show corresponding trends (thus a bit different from
> > the case when, e.g., sampling nine neurons from the same individual).
> > Would nesting channel within participant deal with that relationship?
> >
> > 2. I actually also tried nesting channel within participant. However,
> > when I proceeded to run planned comparisons (I guess I'd better have
> > them done because of their theoretical significance) based on this
> > mixed-effect modeling approach (as illustrated in the earlier mail but
> > with the random factor as (1|participant/channel), to maintain
> > consistency of analytical methods), R gave me an error message:
> >
> > Error: number of levels of each grouping factor must be < number of
> > observations
> >
> >
> > I think this is because in my data, each participant only contributes
> > one data point per channel and thus the data points are not enough. I
> > guess that probably means I can't go on in this direction to run the
> > planned comparisons... (?) I'm not pretty sure how contrasts based on
> > binary dummy variables may be done and will try to further explore
> > that. But before I establish the mixed model I already set up
> > orthogonal contrasts for group and item in the dataset using the
> > function contrasts(). Does this have anything to do with what you
> meant?
> >
> > 3. I worried about pseudoreplicability when participant ID is not
> > included. Concerning this point, later it came to me that
> > pseudoreplicability usually occurred in cases when multiple responses
> > from the same individual are grouped in the same cell, rendering the
> > data within the same cell non-independent (similar to the case of
> > repeated-measure ANOVA? sorry if I got a wrong understanding...). But
> > as mentioned earlier in my data, each participant only contributes one
> > data point per channel, when channel alone is already modeled as a
> > random factor, would that mean all data points within a cell all come
> > from different participants and thus in this case may deal with the
> > independence assumption? (Again I'm sorry if my concept is wrong and
> > would appreciate instructions on this point...)
> >
> > Many, many thanks!
> >
> > Paul
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2016-06-06 19:10 GMT+02:00 Houslay, Tom <T.Houslay at exeter.ac.uk
> <mailto:T.Houslay at exeter.ac.uk>>:
> >
> >> Hi Paul,
> >>
> >> I don't think anyone's responded to this yet, but my main point would
> >> be that you should check out Schielzeth & Nakagawa's 2012 paper
> >> 'Nested by design' (
> >> http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00251.x/a
> >> bstract
> >> ) for a nice rundown on structuring your model for this type of data.
> >>
> >> It may also be worth thinking about how random intercepts work in a
> >> visual sense; there are a variety of tools that help you do this from
> >> a model (packages sjplot, visreg, broom), or you can just plot
> >> different levels yourself (eg consider plotting the means for AP, AQ,
> >> BP, BQ; the same with mean values from each individual overplotted
> >> around these group means; and even the group means with all points
> >> shown, perhaps coloured by individual - ggplot is really useful for
> >> getting this type of figure together quickly).
> >>
> >> As to some of your other questions:
> >>
> >> 1) You need to keep participant ID in. I'm not 100% on your data
> >> structure from the question, but you certainly seem to have repeated
> >> measures for individuals (I'm assuming that groups A and B each
> >> contain multiple individuals, none of whom were in both groups, and
> >> each of which were shown both objects P and Q, in a random order).
> >> It's not surprising that the effects of group are weakened if you
> >> remove participant ID, because you're then effectively entering
> >> pseudoreplication into your model (ie, telling your model that all
> >> the data points within a group are independent, when that isn't
> the case).
> >>
> >> 2) I think channel should be nested within individual, with a model
> >> something like model <- lmer(voltage ~ group * item +
> >> (1|participant/channel), data = ...)
> >>
> >> 3) This really depends on what your interest is. If you simply want
> >> to show that there is an overall interaction effect, then your
> >> p-value from a likelihood ratio test of the model with/without the
> >> interaction term gives significance of this interaction, and then a
> >> plot of predicted values for the fixed effects (w/ data
> overplotted if possible) should show the trends.
> >> You could also use binary dummy variables to make more explicit
> >> contrasts, but it's worth reading up on these a bit more. I don't
> >> really use these type of comparisons very much, so I can't
> comment further I'm afraid.
> >>
> >> 4) Your item is like treatment in this case - you appear to be more
> >> interested in the effect of different items (rather than how much
> >> variation 'item' explains), so keep this as a fixed effect and
> not as random.
> >>
> >> Hope some of this is useful,
> >>
> >> Tom
> >>
> >>
> >> ________________________________________
> >>
> >>
> >> Message: 1
> >> Date: Fri, 3 Jun 2016 14:28:59 +0200
> >> From: paul <graftedlife at gmail.com <mailto:graftedlife at gmail.com>>
> >> To: r-sig-mixed-models at r-project.org
> <mailto:r-sig-mixed-models at r-project.org>
> >> Subject: [R-sig-ME] Related fixed and random factors and planned
> >> comparisons in a 2x2 design
> >> Message-ID:
> >> <
> >>
> CALS4JYfoTbhwhy8S0kHePuw9pPv-NTkrsLrB2Z2YO5ks5gnnOA at mail.gmail.com
> <mailto:CALS4JYfoTbhwhy8S0kHePuw9pPv-NTkrsLrB2Z2YO5ks5gnnOA at mail.gmail.com>>
> >> Content-Type: text/plain; charset="UTF-8"
> >>
> >> Dear All,
> >>
> >> I am trying to use mixed-effect modeling to analyze brain wave data
> >> from two groups of participants when they were presented with two
> >> distinct stimulus. The data points (scalp voltage) were gathered from
> >> the same set of 9 nearby channels from each participant. And so I
> >> have the following
> >> factors:
> >>
> >> - voltage: the dependent variable
> >> - group: the between-participant/within-item variable for groups A
> >> and B
> >> - item: the within-participant variable (note there are
> exactly only 2
> >> items, P and Q)
> >> - participant: identifying each participant across the two groups
> >> - channel: identifying each channel (note that data from these
> channels
> >> in a nearby region tend to display similar, thus correlated,
> >> patterns in
> >> the same participant)
> >>
> >> The hypothesis is that only group B will show difference between P
> >> and Q (i.e., there should be an interaction effect). So I established
> >> a mixed-effect model using the lme4 package in R:
> >>
> >> model <-
> >> lmer(voltage~1+group+item+(group:item)+(1|participant)+(1|channel),
> >> data=data, REML=FALSE)
> >>
> >> Questions:
> >>
> >> 1.
> >>
> >> I'm not sure if it is reasonable to add in participant as a random
> >> effect, because it is related to group and seems to weaken the
> >> effects of
> >> group. Would it be all right if I don't add it in?
> >> 2.
> >>
> >> Because the data from nearby channels of the same participant tend
> >> to be
> >> correlated, I'm not sure if modeling participant and channel
> as crossed
> >> random effects is all right. But meanwhile it seems also strange
> >> if I treat
> >> channel as nested within participant, because they are the
> same set of
> >> channels across participants.
> >> 3.
> >>
> >> The interaction term is significant. But how should planned
> comparisons
> >> be done (e.g., differences between groups A and B for P) or is
> it even
> >> necessary to run planned comparisons? I saw suggestions for
> t-tests,
> >> lsmeans, glht, or for more complicated methods such as breaking
> >> down the
> >> model and subsetting the data:
> >>
> >> data[, P_True:=(item=="P")]
> >> posthoc<-lmer(voltage~1+group
> >> +(1|participant)+1|channel)
> >> , data=data[item=="P"]
> >> , subset=data$P_True
> >> , REML=FALSE)
> >>
> >> But especially here comparing only between two groups while
> modeling
> >> participant as a random effect seems detrimental to the group
> effects.
> >> And
> >> I'm not sure if it is really OK to do so. On the other hand,
> >> because the
> >> data still contain non-independent data points (from nearby
> >> channels), I'm
> >> not sure if simply using t-tests is all right. Will non-parametric
> >> tests
> >> (e.g., Wilcoxon tests) do in such cases?
> >> 4.
> >>
> >> I suppose I don't need to model item as a random effect
> because there
> >> are only two of them, one for each level, right?
> >>
> >> I would really appreciate your help!!
> >>
> >> Best regards,
> >>
> >> Paul
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >>
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
> <mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
More information about the R-sig-mixed-models
mailing list