[R-sig-ME] Mixed-model polytomous ordered logistic [SEC=Unclassified]

Fri Jan 7 02:44:09 CET 2011

Alan et al.,

I think the difference between the polr(.) and glm(.) results relates to the fundamental issue of multinomials vs independent binomials. The McCullagh (1980) cumulative probability and corresponding cumulative link approach accounts for the covariance between observed frequencies across ordinal categories which are assumed multinomial. The way you have set up the data gives multinomials with a total across categories of 1 so your glm approach treats the data as binary (i.e. TRUE vs FALSE) and as INDEPENDENT binomials. Therefore your glm approach is not equivalent to polr(.) since it ignores these covariances. Note that polr(.) uses optim(.) to fit the model so I gather without looking at the code that it incorporates the covariances as per McCullagh (1980). The composite link function approach gives the same MLEs of parameters as the McCullagh (1980) approach and can use frequencies tallied across data points with the same joint values of X and Cat as the response or it can use the (0,1) data as the response with both cases modelled as Poisson since the Poisson is a trick to use the GLM fitting algorithm for multinomial data (eg log-linear models for contingency tables). The Poisson trick is required since GLMs do not include the multinomial distn. So my point still stands that to correctly fit ordered categorical data using the GLM fitting algorithm requires the use of composite link functions.

Steve

-------------------------------------------------------------
Steven G Candy
Senior Applied Statistician
Southern Ocean Ecosystems
Australian Antarctic Division
Kingston, Tasmania 7050
Australia
ph +61 (0)3 6232 3135

-----Original Message-----
From: Alan Cobo-Lewis [mailto:Alan_Cobo-Lewis at umit.maine.edu]
Sent: Friday, 7 January 2011 8:31 AM
To: r-sig-mixed-models at r-project.org
Cc: Steve Candy; davidD at qimr.edu.au; rhbc at imm.dtu.dk; emm.charpentier at free.fr
Subject: Re: Mixed-model polytomous ordered logistic [SEC=Unclassified]

Emmanuel et al.,

I was very interested in the possibility of doing proportional odds ordinal regression with glm and lmer, but on further investigation I don't think they're fitting the same model. I know that polr and Glm.polr gave similar but slightly different coefficients for the data in Emmanuel's 2010dec30 message, but I think they differ by more than roundoff error.

I believe that the Glm.polr approach isn't exactly an ordered approach, because it doesn't enforce the constraint that the thresholds be ordered as anticipated.

It may be possible to fit unordered multinomial models in a mixed models context by fitting surrogate poisson models in lmer analogously to what is described for glm in Venables & Ripley's MASS. Some ordered models (eg continuation ratio logits) may also be fittable in lmer, but I don't see how to fit proportional odds models in lmer. (I would like to be shown wrong.)

I suspect the agreement was close in Emmanuel's 2010dec30 because it was a fairly big dataset that contained fairly good evidence that the ordering of the categories was as anticipated. But if the evidence in the data is not so strong about the ordering, or if there is evidence in the data that the ordering is not what the polr model assumes, then the polr and Glm.polr might disagree more obviously.

In the self-contained example below, Data.inorder represents a case where there are 4 categories in the DV ordered as hypothesized ("1"<"2"<"3"<"4"), and Data.outoforder represents a case where the actual order is more like "4"<="1"<"2"<"3" even though the polr model still assumes "1"<"2"<"3"<"4". In this self-contained example, note that the coefficients in fit.polr.inorder and fit.polrglm.inorder are in fair agreement (though they seem to disagree by more than can be accounted for by roundoff error), whereas the coefficients in fit.polr.outoforder and fit.polrglm.outoforder are in more clear disagreement.

Data.outoforder <- Data.inorder <- data.frame(
        Cat=ordered(c(1,1,1,1, 2,2,2, 3,3,3, 4,4)),
        X=c(
                0.5, 0.5, 0.7, 3,
                1, 1, 1.2,
                1.4, 2, 4,
                2, 5
        )
)
Data.outoforder[ Data.outoforder$Cat==4, "X" ] <- 0.4

plot( jitter(as.numeric(Cat))~X, data=Data.inorder, yaxt="n"); axis(side=2, at=1:4); title("Data.inorder, Cat 4 highest") plot( jitter(as.numeric(Cat))~X, data=Data.outoforder, yaxt="n"); axis(side=2, at=1:4); title("Data.outoforder, Cat 4 lowest")

stack.data <- function(Data) {
        lev <- levels(Data$Cat)
        ll <- length(lev)
        thr <- paste( lev[-ll], "|", lev[-1], sep="" )
        Dataset <- do.call( rbind,
                lapply(2:ll,
                        function(i) {
                                data.frame( Thr=thr[i-1],
                                        Cat=Data$Cat >= lev[i],
                                        Data[ ,names(Data)[!(names(Data) %in% c("Thr","Cat"))] ]
                                )
                        }
                )
        )
        Dataset$Thr <- factor(Dataset$Thr)
        names(Dataset)[ ! names(Dataset) %in% c("Thr","Cat") ] <- names(Data)[ names(Data)!="Cat" ]
        Dataset
}

Dataset.inorder <- stack.data(Data.inorder)
Dataset.outoforder <- stack.data(Data.outoforder)

library(MASS)

( fit.polr.inorder <- polr( Cat~X, data=Data.inorder ) )        # slope=1.096652, intercepts 0.8547446 2.2567286 4.3084577
( fit.polrglm.inorder <- glm( Cat~0+Thr+X, data=Dataset.inorder, family=binomial(link=logit) ) )        # slope=1.0496, thresholds -0.8856  -2.2468  -4.1911
# fit.polr.inorder and fit.polrglm.inorder are close, but disagree by more than roundoff error

( fit.polr.outoforder <- polr( Cat~X, data=Data.outoforder ) )  # slope=0.08432523, intercepts -0.5766306  0.4664993  1.7432196 ( fit.polrglm.outoforder <- glm( Cat~0+Thr+X, data=Dataset.outoforder, family=binomial(link=logit) ) )  # slope=0.0322, thresholds 0.6502  -0.3798  -1.6530 # fit.polr.outoforder and fit.polr.glm.outoforder aren't even close

alan

--
Alan B. Cobo-Lewis, Ph.D.               (207) 581-3840 tel
Department of Psychology                (207) 581-6128 fax
University of Maine
Orono, ME 04469-5742                    alanc at maine.edu

r-sig-mixed-models at r-project.org on Thursday, January 06, 2011 at 6:00 AM -0500 wrote:
>Message: 4
>Date: Thu, 6 Jan 2011 10:57:38 +1100
>From: Steve Candy <Steve.Candy at aad.gov.au>
>To: "'David Duffy'" <davidD at qimr.edu.au>, "'Rune Haubo'"
>       <rhbc at imm.dtu.dk>
>Cc: "'r-sig-mixed-models at r-project.org'"
>       <r-sig-mixed-models at r-project.org>
>Subject: Re: [R-sig-ME] Mixed-model polytomous ordered logistic
>       [Sec=Unclassified]
>Message-ID:
>
><42C7503E0B99F248B06BC65E7D2B4E02019D5C1C4003 at EX2K7-CCR.AAD.GOV.AU>
>Content-Type: text/plain
>
>Hi David & Rune
>
>Sorry if I have not used the correct way to respond but I only get the
>digest so I'm not sure how to do it properly.
>
>Anyway, to the point.
>
>The only way I know how to fit cumulative link functions for
>mixed-model polytomous ordered logistic (or probit, complementary
>log-log) links using the GLM fitting algorithm is to use composite link
>functions as originally published for multinomial data by Thompson and
>Baker (1981) Applied Statistics 30, 125-131. The generalisation to
>mixed models using GLMMs is given ("buried in") in Candy 1997
>Biometrics 53:146-160 (see pg 152) and uses a Poisson distn with
>constraints imposed on fitted values to sum to observed marginal
>frequencies. This composite link function approach is a bit messy to
>set up, so it might be less trouble to program it directly as simply an
>nlme problem. Also if the categories are not ordered this approach
>breaks down because its not possible to build the appropriate
>constraints into the GLMM fitting algorithm.
>
>HTH, Steve
>
>-------------------------------------------------------------
>Steven G Candy
>Senior Applied Statistician
>Southern Ocean Ecosystems
>Australian Antarctic Division
>Kingston, Tasmania 7050
>Australia
>ph +61 (0)3 6232 3135
>

___________________________________________________________________________

    Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________