[R] Appropriate specification of random effects structure for EEG/ERP data: including Channels or not?
Paolo Canal
paolo.canal at iusspavia.it
Wed Sep 23 12:46:46 CEST 2015
Dear r-help list,
I work with EEG/ERP data and this is the first time I am using LMM to
analyze my data (using lme4).
The experimental design is a 2X2: one manipulated factor is agreement,
the other is noun (agreement being within subjects and items, and noun
being within subjects and between items).
The data matrix is 31 subjects * 160 items * 33 channels. In ERP
research, the distribution of the EEG amplitude differences (in a time
window of interest) are important, and we care about knowing whether a
negative difference is occurring in Parietal or Frontal electrodes. At
the same time information from single channel is often too noisy and
channels are organized in topographic factors for evaluating differences
in distribution. In the present case I have assigned each channel to one
of three levels of two factors, i.e., Longitude (Anterior, Central,
Parietal) and Medial (Left, Midline, Right): for instance, one channel
is Anterior and Left. With traditional ANOVAs channels from the same
level of topographic factors are averaged before variance is evaluated
and this also has the benefit of reducing the noise picked up by the
electrodes.
I have troubles in deciding the random structure of my model. Very few
examples on LMM on ERP data exist (e.g., Newman, Tremblay, Nichols,
Neville & Ullman, 2012) and little detail is provided about the
treatment of channel. I feel it is a tricky term but very important to
optimize fit. Newman et al say "data from each electrode within an ROI
were treated as repeated measures of that ROI". In Newman et al, the
ROIs are the 9 regions deriving from Longitude X Medial (Anterior-Left,
Anterior-Midline, Anterior-Right, Central-Left ... and so on), so in a
way they treated each ROI separately and not according to the relevant
dimensions of Longitude and Medial.
We used the following specifications in lmer:
[fixed effects specification: υV ~ Agreement * Noun * Longitude * Medial
* (cov1 + cov2 + cov3 + cov4)] (the terms within brackets are a series
of individual covariates, most of which are continuous variables)
[random effects specification: (1+Agreement*Type of Noun | subject) +
(1+Agreement | item) + (1|longitude:medial:channel)]
What I care the most about is the last term
(1|longitude:medial:channel). I chose this specification because I
thought that allowing each channel to have different intercepts in the
random structure would affect the estimation of the topographic fixed
effects (Longitude and Medial) in which channel is nested. Unfortunately
a reviewer commented that since "channel is not included in the fixed
effects I would probably leave that out".
But each channel is a repeated measure of the eeg amplitude inside the
two topographic factors, and random terms do not have to be in the fixed
structure, otherwise we would also include subjects and items in the
fixed effects structure. So I kind of feel that including channels as
random effect is correct, and having them nested in longitude:medial
allows to relax the assumption that the effect in the EEG has always the
same longitude:medial distribution. But I might be wrong.
I thus tested differences in fit (ML) with anova() between
(1|longitude:medial:channel) and the same model without the term, and a
third model with the model with a simpler (1|longitude:medial).
Fullmod vs Nochannel:
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
modnoch 119 969479 970653 -484621 969241
fullmod 120 968972 970156 -484366 968732 508.73 1 < 2.2e-16 ***
Differences in fit is remarkable (no variance components with estimates
close to zero; no correlation parameters with values close to ±1).
Fullmod vs SimplerMod:
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fullmod 120 968972 970156 -484366 968732
simplermod 120 969481 970665 -484621 969241 0 0 1
Here the number of parameters to estimate in fullmod and simplermod is
the same but the increase in fit is very consistent (-509 BIC). So I
guess although the chisquare is not significant we do have a string
increase in fit. As I understand this, a model with better fit will find
more accurate estimates, and I would be inclined to keep the fullmod
random structure.
But perhaps I am missing something or I am doing something wrong. Which
is the correct random structure to use?
Feedbacks are very much appreciated. I often find answers in the list,
and this is the first time I post a question.
Thanks,
Paolo
[[alternative HTML version deleted]]
More information about the R-help
mailing list