[R-sig-ME] How to use mixed-effects models on multinomial data

Fri May 29 19:13:59 CEST 2009

Thanks to all of you for your detailed comments. I find them very useful, although some of them point in different directions.

First, I should explain the structure of my data set in more detail: 

In the data set, each "item" is a list of 5 words. In an earlier analysis I carried out on these data, the response variable was the accuracy of recalling each word list (list recall). So, either subjects recalled a list correctly (i.e., recalled all 5 words in the list correctly), or they did not recall the list correctly (i.e., did not recall all 5 words correctly). Because the response in this analysis was binary, I used the mixed logit model. (Note that in my original e-mail, I only wanted to show the general structure of the lmer() formula that I'm using. The formula I'm actually using looks like this: lmer(response ~ predictor1 + predictor2 + predictor2 * predictor3 + (1 + predictor1 + predictor2 + predictor2 * predictor3 | subject) + (1 |item), data, family="binomial"). In short, I have random slopes for my subjects, but no random slopes for my items. This is because all three predictors are item-specific properties, and because I want to control for any variation between subjects in their sensitivity to these properties. On the basis of model comparisons, I then gradually simplify this initial model.

Now, the analysis I'm currently struggling with is carried out on the same data set, but the response variable is now the accuracy of recalling each word in a list (item recall), with subjects recalling either 0, 1, 2, 3, 4, or 5 words correctly. So, there are six, rather than two, possible responses. It is true that for each item, the response is still either correct or incorrect, but since it is the response for the entire list that concerns me, I would describe the responses as multinomial. Below, you see a subset of the trials in my data set:  

Subject Trial Item   W1 W2 W3 W4 W5     Predictor1    Predictor2    Predictor3   Correct

 1           3       9      1    1    0    1   1                  1               1               1               4

 1           4     12      1    0    0    0   1                  1                1               0              2

 1           5       4      0    0    0    0   0                  1                1               1              0

 1           6       6      1    1    1    1    1                 1                2               1              5

Profesors Baron's and Bates' suggest that I use a linear mixed-effects model, and as a consequence, disregard the information that is contained in the ordering of my 6 possible responses. They further suggest that I plot the residuals against each of my predictors. This is to get an idea of how well the model fits the observed pattern of each of my predictors, right? If, say, for predictor1 the residuals are very large, that would mean that the model has fitted the pattern of this predictor very poorly, right? I have produced the lmer model and have tried to make the residual plots, but have not succeeded. I can plot the residuals against the fitted values (but have to admit that I find it difficult to make sense of the plot), but how do I make separate plots for each of my predictors? Please let me know if I have misunderstood something here. 

Linda

________________________________

Fra: r-sig-mixed-models-bounces at r-project.org på vegne af Douglas Bates
Sendt: to 28-05-2009 20:13
Til: Jonathan Baron
Cc: r-sig-mixed-models at r-project.org; Emmanuel Charpentier
Emne: Re: [R-sig-ME] How to use mixed-effects models on multinomial data

On Thu, May 28, 2009 at 9:24 AM, Jonathan Baron <baron at psych.upenn.edu> wrote:
> I had already replied to Linda Mortensen, but Emmanuel Charpentier's
> reply gives me the courage to say to the whole list roughly what I
> said before, plus a little more.

> The assumption that 0-1, 1-2, ... 4-5 are equally spaced measures of
> the underlying variable of interest may indeed be incorrect, but so
> may the assumption that the difference between 200-300 msec reaction
> time is equivalent to the difference between 300-400 msec (etc.).
> Failure of the assumptions will lead to some additional error, but, as
> argued by Dawes and Corrigan (Psych. Bull., 1974), not much.  (And you
> can look at the residuals as a function of the predictions to see how
> bad the situation is.)  In general, in my experience (for what that is
> worth), you lose far less power by assuming equal spacing than you
> lose by using a more "conservative" model that treats the dependent
> measure as ordinal only.

I'm glad to see you write that, Jonathon.  I don't have a lot of
experience modeling ordinal response data but my impression is that
there is more to lose by resorting to comparatively exotic models for
an ordinal response than by modeling it with a Gaussian "noise" term.
In cases like this where there are six levels, 0 to 5, I think your
suggestion of beginning with a linear mixed-effects model and checking
the residuals for undesirable behavior is a good start.

> Occasionally you may have a theoretical reason for NOT treating the
> dependent measure as equally spaced (e.g., when doing conjoint
> analysis), or for treating it as equally spaced (e.g., when testing
> additive factors in reaction time).
>
> In the former sort of case, it might be appropriate to fit a model to
> each subject using some other method, then look at the coefficients
> across subjects.  (This is what I did routinely before lmer.)
>
> Jon
>
> On 05/28/09 14:35, Emmanuel Charpentier wrote:
>> Le mercredi 27 mai 2009 ?  18:08 +0200, Linda Mortensen a écrit :
>> > Dear list members,
>> >
>> > In the past, I have used the lmer function to model data sets with
>> > crossed random effects (i.e., of subjects and items) and with either a
>> > continuous response variable (reaction times) or a binary response
>> > variable (correct vs. incorrect response). For the reaction time data,
>> > I use the formula:
>> > lmer(response ~ predictor1 * predictor2 ....  + (1 + predictor1 *
>> > predictor2 .... | subject) + (1 + predictor1 * predictor2 .... |
>> > item), data)
>
> I think that the second random effect term should be (0 + ...), since
> there is already an intercept in the first one.

I don't think so.  It is quite legitimate to have random effects of
the form (1|subject) + (1|item) and the formula above is a
generalization of this.  A additive random effect for each subject is
not confounded with an additive random effect for each item.

I would be a more concerned about the number of random effects per
subject and per item when you have a complex formula like 1 +
predictor1 * predictor2 on the left hand side of the random-effects
term.  If predictor1 and predictor2 are both numeric predictors this
might be justified but I would look at it carefully.

> > I'm currently working on a data set for which the response variable is
>> > number of correct items with accuracy ranging from 0 to 5. So, here
>> > the response variable is not binomial but multinomial.
>
>> This approximation may be too rough with only 5 items, though.
>> Furthermore, depending on your beliefs on the cognitive model involved
>> in giving a "correct" response, the distance between 0 and 1 correct
>> response(s) may be close to or very different from the distance between
>> 4 and 5 correct responses, which is exactly what proportional risks
>> model (polr) tries to explain away.
>
> --
> Jonathan Baron, Professor of Psychology, University of Pennsylvania
> Home page: http://www.sas.upenn.edu/~baron
> Editor: Judgment and Decision Making (http://journal.sjdm.org <http://journal.sjdm.org/> )
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models