[R-sig-ME] Specifying the correct LMM for 'unsual' data
Maarten Jung
Maarten.Jung at mailbox.tu-dresden.de
Fri Jan 26 17:54:20 CET 2018
Hi Tom,
your suggestions for the categorical predictors make sense and are
conceptually a much better solution than collapsing everything into a
single predictor - many thanks for that!
I am aware of the partial pooling/shrinkage in the estimation process,
although for your suggestion there would literally be no data for the
VS-miss-condition. And I think that, in this case, the estimation would be
based on the younger children given that there are clearly more missing
data points for older children.
With my second question I was referring to the MAR (missing at random)
assumption of mixed models: "missing data on a given variable
may depend on other observed information, but does not depend on the data
that would have been observed but were in fact missing" (West, Welch &
Galecki, 2015).
I have read that including covariates which 'predict' the nonavailability
of data points should be included (but, to be honest, I have no idea how
this helps with the missing data) and wonder if the inclusion of say number
of hits (if this is a better predictor than age_group) would improve the
model.
Best,
Maarten
On Thu, Jan 25, 2018 at 4:08 PM, Tom Fritzsche <tom.fritzsche at uni-potsdam.de
> wrote:
> Hi Maarten,
>
> I would not collapse the task and the kind of response (hit/miss) into
> one condition predictor. They are conceptually independent as task is
> a manipulated factor and response a measured value (covariate in this
> model). Also, one of them can vary within pictures the other not (see
> model specification below).
>
> So my suggestion would be to have those two predictors:
>
> task: 2-level factor: PM, VS
> response: 2-level predictor: hit, miss
>
> Beware of how you specify the contrasts for (all of) the categorical
> predictors. The default treatment contrast is most likely not the most
> straight-forward way to interpret the model estimates.
>
> Regarding your questions:
>
> 1. Am I correct with the maximal linear mixed model specifications?
>
> With the changed predictors I think that this would be the maximal
> model. Response can vary also within pictures as each can be a hit or
> miss.
>
> lmer(dwell_time ~ age_group * task * response + (1 + task * response |
> participant) + (1 + response | picture), data)
>
>
> 2. I think that the data points in the PM-miss-condition (or
> PM-hit-condition) are not missing at random because they are missing if
> (and only if) there are 6 data point for the same participant in the
> PM-hit-condition (and vice versa). Do you think one has to worry about this
> and are there any suggestions how to deal with it?
>
> Imbalanced data sets and even missing design cells are not a problem
> for mixed models as they take the number of the observation into
> account (shrinkage).
>
> Best,
> Tom
>
> ---
>
> Tom Fritzsche
> University of Potsdam
> Department of Linguistics
> Karl-Liebknecht-Str. 24-25
> 14476 Potsdam
> Germany
>
> office: 14.140
> phone: +49 331 977 2296 <+49%20331%209772296>
> fax: +49 331 977 2095 <+49%20331%209772095>
> e-mail: tom.fritzsche at uni-potsdam.de
> web: www.ling.uni-potsdam.de/~fritzsche
>
>
>
> On 25 January 2018 at 15:35, Maarten Jung
> <Maarten.Jung at mailbox.tu-dresden.de> wrote:
> > Dear list,
> >
> > a colleague of mine asked me to help her planing a linear mixed models
> > analysis and, as handling her data and the corresponding research
> questions
> > with lmer seems kind of tricky to me, I hope one of you can help me
> along.
> >
> > +++++++++++++++++++++++++++++++++++++
> > The experiment is as follows:
> >
> > Participants (46 younger and 45 older children) looked at a series of
> > pictures (one picture per trial) and had to solve two tasks
> consecutively:
> >
> > - Task block 1: Prospective memory (PM) task: while doing other tasks,
> > participants had to remember to press a specified button when they saw a
> > certain object
> > - Task block 2. Visual search: participants had only this one task –
> > pressing a button as soon as possible when seeing a certain object
> >
> > Each child saw the same pictures in the same task block – pictures 1-6 in
> > task block 1 and pictures 7-18 in task block 2. Each picture was
> presented
> > only once, so there were different pictures in the task blocks.
> >
> > Trials with target object in task 1 are allocated regarding the
> > participant’s reactions in PM hits (participants did press the button)
> and
> > PM misses (participants did not press the button). (Therefore, a certain
> > picture can be a PM hit trial for one child and a PM miss trial for the
> > other.) As there were six trials (= pictures), which contained the target
> > object, each participant can have a minimum of zero and a maximum of six
> PM
> > hits with the according number of PM misses.
> > Here is the number of PM hits per age group:
> >
> > Younger children:
> > - 2 children: 0 hits
> > - 9 children: 1 hit
> > - 8 children : 2 hits
> > - 12 children: 3 hits
> > - 4 children: 4 hits
> > - 4 children: 5 hits
> > - 7 children: 6 hits
> >
> > Older children
> > - 2 children: 0 hits
> > - 3 children: 1 hit
> > - 4 children: 2 hits
> > - 6 children: 3 hits
> > - 7 children: 4 hits
> > - 11 children: 5 hits
> > - 12 children: 6 hits
> >
> > (In the visual search task almost all children have pressed the button
> > correctly in all 12 visual search target trials).
> >
> > She is interested in how long participants looked at the PM and visual
> > search target, respectively, depending on if it was a PM hit, a PM miss
> or
> > a visual search hit and how this is influenced by the age group.
> Therefore,
> > she has got only one data point per trial. And if a participant has no PM
> > misses there is no data point at all in this condition for this
> participant.
> >
> > The variables are defined as follows:
> > - age_group: categorical predictor with 2 levels (younger and older
> > children)
> > - condition: categorical predictor with 3 levels (PM hit, PM miss, visual
> > search hit)
> > +++++++++++++++++++++++++++++++++++++
> >
> > My suggestion for the maximal linear mixed model would be:
> >
> > lmer(dwell_time ~ age_group*condition + (1 + condition|participant) +
> > (1|picture), data)
> >
> > I intentionally didn`t use (1 + condition|picture) here because there are
> > different pictures in the task blocks (see above) - hope this makes
> sense.
> >
> > I have two questions:
> > 1. Am I correct with the maximal linear mixed model specifications?
> > 2. I think that the data points in the PM-miss-condition (or
> > PM-hit-condition) are not missing at random because they are missing if
> > (and only if) there are 6 data point for the same participant in the
> > PM-hit-condition (and vice versa). Do you think one has to worry about
> this
> > and are there any suggestions how to deal with it?
> >
> > Best,
> > Maarten
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list