[R-sig-ME] ordered multinomial mixed model
Thomas Mang
thomasmang.ng at googlemail.com
Fri May 27 17:03:31 CEST 2011
Hi,
Consider the need for a mixed-effects model with an ordered multinomial
response variable. To the best of my knowledge, lme4 would not provide
such a thing.
However, I was wondering if such a model can be built using a
generalized mixed model with Bernoulli response variables. Here is my
thinking, with the question if this is possible as outlined below:
Code the possible response level factors in a binary fashion. For
example, let's assume that there are three levels, denote them "a", "b"
and "c" (in that order), and let the binary representation be:
a = 0
b = 10
c = 11
The idea is that the 0 / 1s represent individual Bernoulli random
variables which describe if one advances one item up in the ordered
response levels list (1) or stops traversing (0). So the first draw
tells whether the response variable is "a" (X = 0) or it is greater than
"a" (X = 1); conditional on the response is greater than "a", a second
Bernoulli random draw tells if the response is "b" (Y = 0), or greater
than "b" (where only "c" is left if Y = 1).
Now translate that into probabilities:
Let the probability of the response random variable, denoted as Z, being
"a" be written as:
P(Z == a) = p_A.
and the complementary probability that Z is greater than "a" is, of course,
P(Z > a) = 1 - p_A.
Similar to the binary coding above, the probability for Z == b or Z ==
c can be written using a conditional random variable:
P(Z == b) = P(Z == b | Z > a) * P (Z > a) = p_B * (1 - p_A).
and
P(Z == c) = P(Z == c | Z > a) * P(Z > a) = (1 - p_B) * (1 - p_A).
Hence I have just written down the probabilities for each of the three
ordered factor levels using (conditional and unconditional) Bernoulli
random variables.
Of course, each of these probabilities can also be expressed conditional
on covariates, giving rise to a regression model.
I am now wondering if above multinomial model can be realized as mixed
model by appropriately preparing the input table: For a single response
measure, the contribution to the likelihood has above been written down
as one ('a') or product of two ("b", "c") Bernoulli probabilities. Using
a Binomial regression model I should be able to yield these
probabilities exactly by using one input table row if the response value
is "a" and by coding a 0 in the response vector, while I can use two
input table rows if the response is "b" / "c" where in the first row the
response value coded as 1 (as in my binary coding above), and in the
second row it is coded as 0 if the response measure is "b", and as 1
otherwise.
Now to the covariates: Assuming within above framework the effect of
covariates is identical across all factor levels, I would simply
duplicate the covariates row if the response is "b" / "c".
Conceptionally this would be somewhat similar to a proportional odds
model, though the maths behind it is of course deviating. But consider
the case where I believe that covariates do not impose the same effect
across all factor levels. Then I could, globally, introduce an
interaction between a binary factor and a continuous covariate, where in
the case of evaluating a row checking for "a" as response the factor
takes on one level, and in a row checking for "b" / "c" it takes on the
other level. That would fit de facto two regression coefficients for my
continuous covariate, depending on which Bernoulli random variable I am
evaluation.
Finally, at the end degrees of freedom must be adjusted, as every
response measure should provide only one, independent if it had been set
up using one or two input table rows.
My question:
Is above thinking correct throughout? Is this maybe already an
established and accepted approach to simulate an ordered multinomial
regression model (independent of glm, glmm etc?)
It should work smoothly for glm-s. Is there any obstacle using above
approach when fitting a mixed-effects model? E.g. would a REML fitting
somehow blow up my thinking ? If it does, would it screw up things
totally, or just provide a not-totally-correct but maybe acceptable
approximation ?
In short, can I use above steps to get my needed ordered multinomial
mixed-effects model, or do I overlook something ?
many thanks for any input and apologize the long text,
best
Thomas
More information about the R-sig-mixed-models
mailing list