[R-sig-ME] ordered multinomial mixed model
Rune Haubo
rune.haubo at gmail.com
Sat May 28 22:02:10 CEST 2011
On 27 May 2011 17:03, Thomas Mang <thomasmang.ng at googlemail.com> wrote:
> Hi,
>
> Consider the need for a mixed-effects model with an ordered multinomial
> response variable. To the best of my knowledge, lme4 would not provide such
> a thing.
You are right; the lme4 package does not provide functions to fit
ordinal mixed models - at least not without modifications. However,
there are other CRAN (e.g., ordinal and MCMCglmm) and R-Forge (e.g.,
ordinal2) packages that does so.
>
> However, I was wondering if such a model can be built using a generalized
> mixed model with Bernoulli response variables. Here is my thinking, with the
> question if this is possible as outlined below:
This is a good idea - so good in fact that others have had it before.
Recently Ken Knoblauch posted the polmer function to this list based
on this idea and using glmer
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/005069.html).
I am not sure if polmer is exactly the translation of your description
into an R function, but it seems to me that it is fairly close.
The only source in the literature to propose this approach that I am
aware of is Winship, C. and R. D. Mare (1984). Regression models with
ordinal variables. American Sociological Review 49 (4), pp. 512-525. -
a very nice paper by the way.
>
> Code the possible response level factors in a binary fashion. For example,
> let's assume that there are three levels, denote them "a", "b" and "c" (in
> that order), and let the binary representation be:
> a = 0
> b = 10
> c = 11
>
> The idea is that the 0 / 1s represent individual Bernoulli random variables
> which describe if one advances one item up in the ordered response levels
> list (1) or stops traversing (0). So the first draw tells whether the
> response variable is "a" (X = 0) or it is greater than "a" (X = 1);
> conditional on the response is greater than "a", a second Bernoulli random
> draw tells if the response is "b" (Y = 0), or greater than "b" (where only
> "c" is left if Y = 1).
>
> Now translate that into probabilities:
>
> Let the probability of the response random variable, denoted as Z, being "a"
> be written as:
> P(Z == a) = p_A.
> and the complementary probability that Z is greater than "a" is, of course,
> P(Z > a) = 1 - p_A.
>
> Similar to the binary coding above, the probability for Z == b or Z == c
> can be written using a conditional random variable:
> P(Z == b) = P(Z == b | Z > a) * P (Z > a) = p_B * (1 - p_A).
> and
> P(Z == c) = P(Z == c | Z > a) * P(Z > a) = (1 - p_B) * (1 - p_A).
>
> Hence I have just written down the probabilities for each of the three
> ordered factor levels using (conditional and unconditional) Bernoulli random
> variables.
> Of course, each of these probabilities can also be expressed conditional on
> covariates, giving rise to a regression model.
>
> I am now wondering if above multinomial model can be realized as mixed model
> by appropriately preparing the input table: For a single response measure,
> the contribution to the likelihood has above been written down as one ('a')
> or product of two ("b", "c") Bernoulli probabilities. Using a Binomial
> regression model I should be able to yield these probabilities exactly by
> using one input table row if the response value is "a" and by coding a 0 in
> the response vector, while I can use two input table rows if the response is
> "b" / "c" where in the first row the response value coded as 1 (as in my
> binary coding above), and in the second row it is coded as 0 if the response
> measure is "b", and as 1 otherwise.
>
> Now to the covariates: Assuming within above framework the effect of
> covariates is identical across all factor levels, I would simply duplicate
> the covariates row if the response is "b" / "c". Conceptionally this would
> be somewhat similar to a proportional odds model, though the maths behind it
> is of course deviating. But consider the case where I believe that
> covariates do not impose the same effect across all factor levels. Then I
> could, globally, introduce an interaction between a binary factor and a
> continuous covariate, where in the case of evaluating a row checking for "a"
> as response the factor takes on one level, and in a row checking for "b" /
> "c" it takes on the other level. That would fit de facto two regression
> coefficients for my continuous covariate, depending on which Bernoulli
> random variable I am evaluation.
I think of the "different covariate effects for different response
levels" as the effect of that covariate being 'nominal' or un-ordered
rather than 'ordinal'. This is essentially what Peterson and Harrell
(1990) [Partial Proportional Odds Models for Ordinal Response
Variables, Applied Statistics, Vol. 39, No. 2. pp. 205-217] denoted
"partial proportional odds" for the proportional odds model. The clm
(clmm) function in package ordinal fits cumulative link (mixed) models
[in you terminology these are ordered multinomial mixed models with a
link function of your choice] allowing for some predictors having
these nominal effects rather than the usual ordinal effects.
>
> Finally, at the end degrees of freedom must be adjusted, as every response
> measure should provide only one, independent if it had been set up using one
> or two input table rows.
This approach (or at least the one detailed by Winship and Mare /
polmer) gives impressively accurate ML estimates considering that it
is an approximate procedure, but the standard errors are often much
too low - the "too many df" is one way to understand this behavior.
>
> My question:
> Is above thinking correct throughout? Is this maybe already an established
> and accepted approach to simulate an ordered multinomial regression model
> (independent of glm, glmm etc?)
> It should work smoothly for glm-s. Is there any obstacle using above
> approach when fitting a mixed-effects model? E.g. would a REML fitting
> somehow blow up my thinking ? If it does, would it screw up things totally,
> or just provide a not-totally-correct but maybe acceptable approximation ?
Strictly speaking REML is only defined for linear mixed models, but
extensions of REML or REML-like procedures to more general
(generalized and other non-linear) models have been proposed. I am not
sure what you mean by REML fitting in this situation...
>
> In short, can I use above steps to get my needed ordered multinomial
> mixed-effects model, or do I overlook something ?
I would suggest using one of the existing packages that provide the
functionality you are looking for. I am the author of ordinal and
ordinal2, so I know what these packages are doing, and I would be
happy to discuss implementation of ordinal mixed models and
approximations in more detail, but perhaps this is more appropriate of
list.
Cheers,
Rune
>
> many thanks for any input and apologize the long text,
> best
> Thomas
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
Rune Haubo Bojesen Christensen
PhD Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54
DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK-2800 Kgs. Lyngby, Denmark
More information about the R-sig-mixed-models
mailing list