[R-sig-ME] Specify the appropriate model for an Event Related Potentials (ERPs) study: what should I do with trial order (and other terms)

Fri Nov 11 11:43:28 CET 2016

Dear Phillip,

Thanks for your reply, I will try to better motivate my choices and add 
further observations.

On 10/11/2016 13:26, Phillip Alday wrote:
> Dear Paolo,
>
> I forgot to state this, but make sure you're using sum coding for your variables! Dale Barr and I have both written things on this:
>
> http://talklab.psy.gla.ac.uk/tvw/catpred/
>
> http://palday.bitbucket.org/stats/coding.html

Thanks for the links. I have already coded my categorical variable with 
sum, or actually with deviation coding. Continuous predictors were 
centered around 0 instead. This should also help in reducing correlation 
in the variance-covariance matrices, and indeed the correlations are 
less than 0.2.

>
> In R, you can do this by placing the following at the top of your script (this variant will give you better named contrasts than the usual way of doing it):
>
>> library(car)
>> options(contrasts = c("contr.Sum","contr.poly"))
> Best,
> Phillip
>   
>
>
>> On 10 Nov 2016, at 22:47, Phillip Alday <phillip.alday at unisa.edu.au> wrote:
>>
>> Dear Paolo,
>>
>> Using Subject (or Participant, if you want to avoid some ambiguity in a linguistic context) and Item as grouping factors is fairly standard in the ERP literature as is use LMM with the mean amplitude in a given time window as the dependent variable (at least when mixed models are used, many colleagues seem reticent to abandon ANOVA despite Clark 1973 and Judd, Westfall and Kenny 2012 and many other papers emphasising the advantages of explicit regression over ANOVA). GAMMs over the whole time course of the ERP are still relatively new and not widely used, although people like Harald Baayen and Stefanie Nickels are working on this. Beyond the enhanced computational complexity, GAMMs also suffer from the whole *additive* bit, which can be addressed, but is difficult for cognitive neuroscience, where the interactions are often the most interesting bits.

I would add that also the work of van Rij and Wieling is inspiring.

>> I hesitate to use Channel as a grouping variable, although this is the approach taken by e.g. Payne et al (2015), because the the distribution of effects for channels is not multivariate normal (the assumed distribution in lme4) for most references. Indeed, we know that Channel effects vary systematically (this the whole notion of "topography" in EEG), and I personally feel that we should actually model channel effects parametrically using a suitable coordinate system, such as the one that the 10-20 system is actually based on (angular deviations from the apical electrode). However, this is again much more complex. Including Channel as a categorical fixed effect is also not particularly satisfying as this will add n_chans-1 coefficients for the main effect of channel as well as many interaction terms. You could potentially have regions of interest (ROIs) / topographical factors (left-right, anterior-posterior) in your model fixed effects and then either ignore channel (as is!
>>   actually done for the traditional rmANOVA analysis of ERP data) or include an intercept-only random-effect term for channel under the assumption that there is a multivariate normal distribution of effects within a given ROI. However, this assumption will generally only hold for high-density setups with topographically small ROIs. Larger ROIs will of course show systematic variation as you move from one edge to another. And you will also run into problems if the number of channels within each ROI are small as this will bias your random-effect estimates: remember that random effects are *variance* components and like all variance estimates, they require several observed levels for accurate estimation. (One rule of thumb I've heard is 10ish.) And as Payne et al saw in their own data, the channel factor typically doesn't help with model fit anyway and can hurt convergence, so I would just leave it out completely if you don't want to model it parametrically.

I think we already had an exchange on this point, a few months ago. I 
may have not specified that the analysis are carried out on two (small) 
subsets of contiguous electrodes (11 and 13 electrodes out of 60), where 
I would assume that all electrodes are representative of the effect, 
with some variation. For this reason I included channel as an 
intercept-only random-effect term. I keep on doing this because model's 
fit increases and I had no problems in convergence. I do not use this 
factor in the fixed effect structure as I am assuming that differences 
between selected electrodes are not relevant.

>> I'm not sure why you mentioned "semantic category" in your random-effect structure. In my experience, semantic category is typically something for which we care about the individual levels (of which there are not that many in any one experiment) and so are better modelled by fixed effects. (In other words, we care about the differences in processing between Furniture and People, not just that different categories show differences.) Items are good random effects, semantic categories are not. And it's not a problem if each item only belong to certain semantic categories. lme4 can handle such nesting structures. If you only have a few semantic categories, then you'll also run into computational / statistical trouble with treating them as random effects (see the last paragraph).

I do not think this is the way to go for our experimental design. In 
fact we focused on a large number (85) of semantic categories (maybe you 
are thinking of a handful of categories such as living/non-living) and 
we were interested in Typicality across all these categories. And I also 
think that it is correct to model random slopes for Typicality for each 
different semantic category because for each semantic category we have 2 
different words (one Typical and one Atypical member) so the 
manipulation is within item and the single item is the semantic 
category. I believe that the inclusion of Typicality as random slope for 
item (semantic categories) serves the idea of making the analysis more 
conservative, and the model "more" maximal: in fact each semantic 
category is associated with larger or smaller differences in typicality 
between the pair of words, and adding the slopes should relax the 
assumption that this difference is the same across categories.
>>
>> In short, I would propose the following model structure:
>>
>> mean_voltage ~ 1 + typicality * education * frequency * semantic_category + (1+... | subject) + (1+ ... | item)

I would agree with you if we were looking at differences between few 
categories. But because we are not, I guess the following would be ok:

mean_voltage ~ 1 + typicality * education * frequency + (1+... | 
subject) + (1+ ... | item) [not necessarily + (1|ch)]

>> Your particular choice of which slopes to include for each random-effect grouping term is a difficult one, as has been highlighted by the Baayen et al (2008), Barr et al (2013), Barr (2013) and the recent set of Bates preprints on parsimonious mixed models as well as a number of threads on this mailing list. Generally, I start off with main effects and if that model converges, great, if not, then I reduce more. In my experience with EEG studies on language, interactions in the random-effects structure just lead to overly complex models that take a long time to compute, fail to converge or show others signs of being degenerate. In other words, I would consider the following RE structure for your data:
>>
>>
>> (1 +  typicality + frequency + semantic_category | subject) + (1 +  education  | item)
>>
>> I left a lot out of the RE structure for Item because, assuming that each Item represents a single lemma / word, then it doesn't have different frequencies / categories / typicalities and so it doesn't make sense to consider a variable effect for something that is constant within the grouping unit. Similarly for education and subject.

As explained above I would keep slopes for item because item is not a 
single lemma but a pair of lemmas.

(1 +  typicality + frequency  | subject) + (1 +  education + typicality  
| item)

Now, this is a random structure that I like because it is simpler than 
the one I had.

>> If you don't model semantic category explicitly, then your item random effect should absorb the variance due to it.  You just won't have an explicit term in the model to point to that only describes the effect of semantic category (as item-level variance will cover a whole host of other effects related to the differences between words).
>>
>> (For posterity -- I think we discussed some of these issues previously on r-help: https://stat.ethz.ch/pipermail/r-help/2015-September/432561.html )
>>
>> To address some of your explicit questions more directly:
>>
>>> - Am I allowed to use the same complex random structure to compare the
>>> likelihood of models that have "simpler" fixed effects? In principle I
>>> guess it is correct to have the same random structure across comparisons.
>> Not quite. You should not have random slopes for effects not in the fixed-effect model structure because the mixed-model formulation used by lme4 assumes zero-mean for the random effects. In other words, lme4 random effects are estimates of how much the different grouping factors lead to variance around the population-level estimates delivered by the fixed effects.

OK so the random adjustments for trial order would only mitigate the 
population effect of trial order but does not help in better estimate 
the the other terms ("absorbing" some variance) if I do not include the 
term in the fixed effect structure. If I include it only in the fixed 
part, as a main effect, the model can explain the variance associated 
with fatigue or adaptation to the experimental setting, but it should 
not affect (interact with) the manipulation of Typicality, if lists were 
properly "randomized" (so I can motivate the choice of not looking at 
the interaction between trial and experimental factors)... it makes sense.

>>> - I am not interested in the effect of serial presentation (trial
>>> order), as it increases the order of the highest interaction. Is it
>>> appropriate to use it in the random structure only, or should I always
>>> discuss it in interaction with my factors of interest?
>>
>> No, for the reason above. But you could have the order of serial presentation a non-interacting / main-effect only fixed effect. Also, if you did the usual thing and you counterbalanced presentation order (e.g. via several different pseudo-random presentation orders/lists) across participants, then the usual assumption is that any effects of presentation order cancel out across participants. The item grouping factor will also absorb some of this variance.
>>
>> Best,
>> Phillip

To sum up. I will start with the following model, driven by the 
experimental design:

mod_between=lmer(eeg~trial + typicality * frequency * edu ...)

the random structure will be conservative but not over-specified:

(1 +  typicality + frequency  | subject) + (1 +  education + typicality  
| item)

to test the influence of cognitive variables on the ERP pattern 
associated with typicality I will keep the very same random structure 
and perform likelihood ratio tests between nested models such as

model_PredX=lmer(eeg~trial + typicality * frequency * edu + PredictorX ...)
model_PredXinteraction=lmer(eeg~trial + typicality * frequency * edu + 
PredictorX + typicality:PredictorX ...)
anova(model_PredX,model_PredXinteraction)

Keeping a less complex random structure has also the benefit of saving 
some degrees of freedom thus allowing for more easily detect higher 
order interactions.

My mind seems now a little bit clearer.
Thank you very much!
Paolo

>>> On 8 Nov 2016, at 21:48, Paolo Canal <paolo.canal at iusspavia.it> wrote:
>>>
>>> Dear Mixed-Group,
>>> I have acquired my data from one Experiment using a rather common
>>> paradigm in psycholinguistics. The experiment aimed at investigating the
>>> electro-physiological correlates of reading Typical (e.g., /chair/) vs
>>> Atypical (e.g., /foot rest/) members of a number (N=85) of semantic
>>> categories (e.g., /a kind of //Furniture/). In particular, we were
>>> interested in looking at differences associated with Education level
>>> (University N=24 vs non-University students N=23), and a three
>>> individual predictors. My issue is how to deal with some factors that
>>> are absolutely important in allowing for a better fit of the model, but
>>> make interpretations too "complicated".
>>>
>>> The two main factors of interest thus Typicality (categorical, Typical
>>> vs Atypical) and Education (categorical, Hi vs Low Education). I already
>>> know that the choice of taking these factors as dichotomic is
>>> questionable, but I believe, defensible: in fact, although the measure
>>> of Typicality is actually continuous (a proportion varying from 0 to 1)
>>> it is paired within each semantic category, because when we selected the
>>> materials we took the pair of exemplars that showed the largest
>>> difference in Typicality, so within each category is the difference in
>>> typicality that actually matters. Treating Education as categorical is
>>> less defensible, but in some way we wanted to compare the predictive
>>> power of this variable with more continuous variables representing a set
>>> of abilities (3 cognitive measure, one of which moderated by years of
>>> education and age), in some way to possibly show that some brain
>>> mechanisms are better described when accounting for individual variation
>>> rather than group differences.
>>>
>>> I used lmer in lme4 to analyze the effect of my independent variables on
>>> the average EEG voltage (continuous) from a set of EEG channels in two
>>> different time-windows of interest (I know GAMM would be even more
>>> appropriate than LMM, as what I am dealing with here are time-series,
>>> but I am not yet ready to try).
>>>
>>> I first determined the random effect structure, selecting three grouping
>>> factors (subject, semantic category and channel) which are clusters of
>>> repeated measures: for each item I have several subjects, for each
>>> subject I have several items and for each channel I have several items
>>> and subjects (perhaps channel might be nested in subject and item rather
>>> than stand alone, any hints?). For each grouping factor, I allowed
>>> intercepts to vary (e.g., 1|subject). Moreover, because I wanted to be
>>> conservative and data are rather malleable (no convergence failure, no
>>> variance = 0 or 1, not too high correlations between terms) I included a
>>> set of terms to adjust by-subject and by-item slopes. I allowed
>>> by-subject and by-item slope adjustments for Typicality (as it varies
>>> within subjects and within semantic category) and by-item slope
>>> adjustments for Education level.
>>>
>>> Things get more complicated when thinking of the influence of two
>>> variables that actually account for a lot of variation in the data:
>>> frequency of use of words and trial order. The first variable is also
>>> theoretically important and I want to include it as fixed effect; the
>>> second variable increases models' fit but because it makes the results
>>> less straightforward to interpret, I would not like to include in the
>>> fixed part of the model.
>>>
>>> This brings me to the fixed effect structure and the actual questions to
>>> the list:
>>>
>>> The initial design was very simple (2X2 plus covariates). The strategy
>>> was to fit the simple model Typicality + Frequency and evaluate if
>>> adding the interaction between Education (or the three covariates) and
>>> Typicality leads to relevant increase in likelihood, using always with
>>> the same random structure (the complex one).
>>>
>>> Now I am not so sure this is appropriate and I have a list of doubts:
>>> - Am I allowed to use the same complex random structure to compare the
>>> likelihood of models that have "simpler" fixed effects? In principle I
>>> guess it is correct to have the same random structure across comparisons.
>>> - I am not interested in the effect of serial presentation (trial
>>> order), as it increases the order of the highest interaction. Is it
>>> appropriate to use it in the random structure only, or should I always
>>> discuss it in interaction with my factors of interest?
>>>
>>> Thanks for any help
>>> Paolo
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models