[R-sig-ME] Related fixed and random factors and planned comparisons in a 2x2 design

Wed Aug 9 15:27:40 CEST 2017

Another old post ...

I wouldn't subset models as a rule of thumb because overarching models
tend to fall on my preferred side of the bias-variance /
underfitting-overfitting tradeoff.

For posthoc comparisons between groups, you can use techniques like
marginal / least-square means, implemented in R in packages like lsmeans.

Phillip

On 06/07/2016 10:27 AM, paul wrote:
> Dear Phillip,
> 
> Many thanks for these resources and replies. They are indeed very
> helpful. I suppose after I've done the contrast coding, I still have to
> subset the data (e.g., singling out the data for P) to do planned
> comparisons between groups, using a reduced mixed model as illustrated
> earlier, then? Or are there any alternative ways to do so?
> 
> Best regards,
> 
> Paul
> 
> 2016-06-07 2:57 GMT+02:00 Phillip Alday <Phillip.Alday at unisa.edu.au
> <mailto:Phillip.Alday at unisa.edu.au>>:
> 
>     In terms of contrast coding, two more helpful resources are:
> 
>     http://talklab.psy.gla.ac.uk/tvw/catpred/
> 
>     http://palday.bitbucket.org/stats/coding.html
> 
>     Channel makes sense as a random effect / grouping term for your
>     particular design, *not* nested within participant. The implicit
>     crossing given by (1|Participant) + (1|Channel) models [omitting any
>     slope terms to focus on the grouping variables] (1) interindividual
>     differences in the EEG and (2) differences between electrodes
>     because closely located electrodes can be thought of as samples from
>     a population consisting of a given Region of Interest (ROI),
>     especially if the electrode placement is somewhat symmetric. The
>     differences resulting from variance in electrode placement between
>     participants will be covered by the implicit crossing of these two
>     random effects.
> 
>     Note that using channel as a random effect is somewhat more
>     difficult if you're doing a whole scalp analysis as sampling across
>     the whole scalp can be viewed as sampling from multiple ROIs, i.e.
>     multiple populations. Two possible solutions are (1) to include ROI
>     in the fixed effects and keep channel in the random effects and (2)
>     model channel as a two or three continuous spatial variables (e.g.
>     displacement from midline or displacement from center based on 10-20
>     coordinates, or spatial coordinates of the sort used in source
>     localisation) in the fixed effects.  In the case of (1), the channel
>     random effect would then be modelling the typical variance within
>     ROIs (because that's hopefully the major source of variance
>     structured  by channel left over after modelling ROI and your
>     experimental manipulation). If this within-variance differs greatly
>     between between ROIs, then this may be a sub-optimal modelling
>     choice. In the case of (2), it might still make sense to
>     additionally model channel as a random effect (i.e. the RE with the
>     factor consisting of channel names, the FE with the continuous
>     coordinates), see Thierry Onkelinx's posts on the subject and
>     http://rpubs.com/INBOstats/both_fixed_random , but I haven't thought
>     about this enough nor examined the resulting model fits.
> 
>     Best,
>     Phillip
> 
>     -----Original Message-----
>     From: R-sig-mixed-models
>     [mailto:r-sig-mixed-models-bounces at r-project.org
>     <mailto:r-sig-mixed-models-bounces at r-project.org>] On Behalf Of paul
>     Sent: Tuesday, 7 June 2016 5:27 AM
>     To: Houslay, Tom <T.Houslay at exeter.ac.uk
>     <mailto:T.Houslay at exeter.ac.uk>>
>     Cc: r-sig-mixed-models at r-project.org
>     <mailto:r-sig-mixed-models at r-project.org>
>     Subject: Re: [R-sig-ME] Related fixed and random factors and planned
>     comparisons in a 2x2 design
> 
>     Dear Tom,
> 
>     Thank you so much for these detailed replies and I appreciate your help!
> 
>     Sincerely,
> 
>     Paul
> 
>     2016-06-06 21:51 GMT+02:00 Houslay, Tom <T.Houslay at exeter.ac.uk
>     <mailto:T.Houslay at exeter.ac.uk>>:
> 
>     > Hi Paul,
>     >
>     >
>     > I think you're right here in that actually you don't want to nest
>     > channel inside participant (which led to that error message - sorry,
>     > should have seen that coming!).
>     >
>     >
>     > It's hard to know without seeing data plotted, but my guess from your
>     > email is that you probably see some clustering both at individual
>     > level and at channel level? Perhaps separate random effects, ie
>     > (1|Participant) + (1|Channel), is the way to go (and then you
>     > shouldn't have the problem as regards number of observations - instead
>     > you'll have an intercept deviation for each of your N individuals, and
>     > also intercept deviations for each of your 9 channels). You certainly
>     > want to keep the participant intercept in though, as each individual
>     > gets both items (right?), so you need to model that association. You
>     > can use your variance components output from lmer to determine what
>     > proportion of the phenotypic variance (conditional on your fixed
>     > effects) is explained by each of these components, eg
>     > V(individual)/(V(individual) + V(channel) + V(residual) would give you
>     > the proportion explained by differences among individuals in their
>     > voltage. It would be cool to know if differences among individuals, or
>     > among channels, is driving the variation that you find. I think using
>     > the sjplot function for lmer would be useful to look at the levels of
>     > your random
>     > effects:
>     >
>     >
>     > http://strengejacke.de/sjPlot/sjp.lmer/
>     >
>     >
>     > As for 'contrasts', again I haven't used that particular package, but
>     > from a brief glance it looks like you're on the right track - binary
>     > coding is the 'simple coding' as set out here:
>     >
>     >
>     > http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
>     >
>     >
>     > Good luck!
>     >
>     >
>     > Tom
>     >
>     >
>     > ------------------------------
>     > *From:* paul <graftedlife at gmail.com <mailto:graftedlife at gmail.com>>
>     > *Sent:* 06 June 2016 20:06:02
>     > *To:* Houslay, Tom
>     > *Cc:* r-sig-mixed-models at r-project.org
>     <mailto:r-sig-mixed-models at r-project.org>
>     > *Subject:* Re: Related fixed and random factors and planned
>     > comparisons in a 2x2 design
>     >
>     > Dear Tom,
>     >
>     > Many thanks for these very helpful comments and suggestions! Would you
>     > just allow me to ask some further questions:
>     >
>     > 1. I've been considering whether to cross or to nest the random
>     > effects for quite a while. Data from the same channel across
>     > participants do show corresponding trends (thus a bit different from
>     > the case when, e.g., sampling nine neurons from the same individual).
>     > Would nesting channel within participant deal with that relationship?
>     >
>     > 2. I actually also tried nesting channel within participant. However,
>     > when I proceeded to run planned comparisons (I guess I'd better have
>     > them done because of their theoretical significance) based on this
>     > mixed-effect modeling approach (as illustrated in the earlier mail but
>     > with the random factor as (1|participant/channel), to maintain
>     > consistency of analytical methods), R gave me an error message:
>     >
>     > Error: number of levels of each grouping factor must be < number of
>     > observations
>     >
>     >
>     > I think this is because in my data, each participant only contributes
>     > one data point per channel and thus the data points are not enough. I
>     > guess that probably means I can't go on in this direction to run the
>     > planned comparisons... (?) I'm not pretty sure how contrasts based on
>     > binary dummy variables may be done and will try to further explore
>     > that. But before I establish the mixed model I already set up
>     > orthogonal contrasts for group and item in the dataset using the
>     > function contrasts(). Does this have anything to do with what you
>     meant?
>     >
>     > 3. I worried about pseudoreplicability when participant ID is not
>     > included. Concerning this point, later it came to me that
>     > pseudoreplicability usually occurred in cases when multiple responses
>     > from the same individual are grouped in the same cell, rendering the
>     > data within the same cell non-independent (similar to the case of
>     > repeated-measure ANOVA? sorry if I got a wrong understanding...). But
>     > as mentioned earlier in my data, each participant only contributes one
>     > data point per channel, when channel alone is already modeled as a
>     > random factor, would that mean all data points within a cell all come
>     > from different participants and thus in this case may deal with the
>     > independence assumption? (Again I'm sorry if my concept is wrong and
>     > would appreciate instructions on this point...)
>     >
>     > Many, many thanks!
>     >
>     > Paul
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     > 2016-06-06 19:10 GMT+02:00 Houslay, Tom <T.Houslay at exeter.ac.uk
>     <mailto:T.Houslay at exeter.ac.uk>>:
>     >
>     >> Hi Paul,
>     >>
>     >> I don't think anyone's responded to this yet, but my main point would
>     >> be that you should check out Schielzeth & Nakagawa's 2012 paper
>     >> 'Nested by design' (
>     >> http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00251.x/a
>     >> bstract
>     >> ) for a nice rundown on structuring your model for this type of data.
>     >>
>     >> It may also be worth thinking about how random intercepts work in a
>     >> visual sense; there are a variety of tools that help you do this from
>     >> a model (packages sjplot, visreg, broom), or you can just plot
>     >> different levels yourself (eg consider plotting the means for AP, AQ,
>     >> BP, BQ; the same with mean values from each individual overplotted
>     >> around these group means; and even the group means with all points
>     >> shown, perhaps coloured by individual - ggplot is really useful for
>     >> getting this type of figure together quickly).
>     >>
>     >> As to some of your other questions:
>     >>
>     >> 1) You need to keep participant ID in. I'm not 100% on your data
>     >> structure from the question, but you certainly seem to have repeated
>     >> measures for individuals (I'm assuming that groups A and B each
>     >> contain multiple individuals, none of whom were in both groups, and
>     >> each of which were shown both objects P and Q, in a random order).
>     >> It's not surprising that the effects of group are weakened if you
>     >> remove participant ID, because you're then effectively entering
>     >> pseudoreplication into your model (ie, telling your model that all
>     >> the data points within a group are independent, when that isn't
>     the case).
>     >>
>     >> 2) I think channel should be nested within individual, with a model
>     >> something like model <- lmer(voltage ~ group * item +
>     >> (1|participant/channel), data = ...)
>     >>
>     >> 3) This really depends on what your interest is. If you simply want
>     >> to show that there is an overall interaction effect, then your
>     >> p-value from a likelihood ratio test of the model with/without the
>     >> interaction term gives significance of this interaction, and then a
>     >> plot of predicted values for the fixed effects (w/ data
>     overplotted if possible) should show the trends.
>     >> You could also use binary dummy variables to make more explicit
>     >> contrasts, but it's worth reading up on these a bit more. I don't
>     >> really use these type of comparisons very much, so I can't
>     comment further I'm afraid.
>     >>
>     >> 4) Your item is like treatment in this case - you appear to be more
>     >> interested in the effect of different items (rather than how much
>     >> variation 'item' explains), so keep this as a fixed effect and
>     not as random.
>     >>
>     >> Hope some of this is useful,
>     >>
>     >> Tom
>     >>
>     >>
>     >> ________________________________________
>     >>
>     >>
>     >> Message: 1
>     >> Date: Fri, 3 Jun 2016 14:28:59 +0200
>     >> From: paul <graftedlife at gmail.com <mailto:graftedlife at gmail.com>>
>     >> To: r-sig-mixed-models at r-project.org
>     <mailto:r-sig-mixed-models at r-project.org>
>     >> Subject: [R-sig-ME] Related fixed and random factors and planned
>     >>         comparisons     in a 2x2 design
>     >> Message-ID:
>     >>         <
>     >>
>     CALS4JYfoTbhwhy8S0kHePuw9pPv-NTkrsLrB2Z2YO5ks5gnnOA at mail.gmail.com
>     <mailto:CALS4JYfoTbhwhy8S0kHePuw9pPv-NTkrsLrB2Z2YO5ks5gnnOA at mail.gmail.com>>
>     >> Content-Type: text/plain; charset="UTF-8"
>     >>
>     >> Dear All,
>     >>
>     >> I am trying to use mixed-effect modeling to analyze brain wave data
>     >> from two groups of participants when they were presented with two
>     >> distinct stimulus. The data points (scalp voltage) were gathered from
>     >> the same set of 9 nearby channels from each participant. And so I
>     >> have the following
>     >> factors:
>     >>
>     >>    - voltage: the dependent variable
>     >>    - group: the between-participant/within-item variable for groups A
>     >> and B
>     >>    - item: the within-participant variable (note there are
>     exactly only 2
>     >>    items, P and Q)
>     >>    - participant: identifying each participant across the two groups
>     >>    - channel: identifying each channel (note that data from these
>     channels
>     >>    in a nearby region tend to display similar, thus correlated,
>     >> patterns in
>     >>    the same participant)
>     >>
>     >> The hypothesis is that only group B will show difference between P
>     >> and Q (i.e., there should be an interaction effect). So I established
>     >> a mixed-effect model using the lme4 package in R:
>     >>
>     >> model <-
>     >> lmer(voltage~1+group+item+(group:item)+(1|participant)+(1|channel),
>     >>               data=data, REML=FALSE)
>     >>
>     >> Questions:
>     >>
>     >>    1.
>     >>
>     >>    I'm not sure if it is reasonable to add in participant as a random
>     >>    effect, because it is related to group and seems to weaken the
>     >> effects of
>     >>    group. Would it be all right if I don't add it in?
>     >>    2.
>     >>
>     >>    Because the data from nearby channels of the same participant tend
>     >> to be
>     >>    correlated, I'm not sure if modeling participant and channel
>     as crossed
>     >>    random effects is all right. But meanwhile it seems also strange
>     >> if I treat
>     >>    channel as nested within participant, because they are the
>     same set of
>     >>    channels across participants.
>     >>    3.
>     >>
>     >>    The interaction term is significant. But how should planned
>     comparisons
>     >>    be done (e.g., differences between groups A and B for P) or is
>     it even
>     >>    necessary to run planned comparisons? I saw suggestions for
>     t-tests,
>     >>    lsmeans, glht, or for more complicated methods such as breaking
>     >> down the
>     >>    model and subsetting the data:
>     >>
>     >>    data[, P_True:=(item=="P")]
>     >>    posthoc<-lmer(voltage~1+group
>     >>        +(1|participant)+1|channel)
>     >>        , data=data[item=="P"]
>     >>        , subset=data$P_True
>     >>        , REML=FALSE)
>     >>
>     >>    But especially here comparing only between two groups while
>     modeling
>     >>    participant as a random effect seems detrimental to the group
>     effects.
>     >> And
>     >>    I'm not sure if it is really OK to do so. On the other hand,
>     >> because the
>     >>    data still contain non-independent data points (from nearby
>     >> channels), I'm
>     >>    not sure if simply using t-tests is all right. Will non-parametric
>     >> tests
>     >>    (e.g., Wilcoxon tests) do in such cases?
>     >>    4.
>     >>
>     >>    I suppose I don't need to model item as a random effect
>     because there
>     >>    are only two of them, one for each level, right?
>     >>
>     >> I would really appreciate your help!!
>     >>
>     >> Best regards,
>     >>
>     >> Paul
>     >>
>     >>         [[alternative HTML version deleted]]
>     >>
>     >>
>     >>
>     >
> 
>             [[alternative HTML version deleted]]
> 
>     _______________________________________________
>     R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
>