# [R] Endogenous variables in ordinal logistic (or probit) regression

Paul Johnson pauljohn32 at gmail.com
Thu Apr 10 00:35:07 CEST 2008

```A student brought this question to me and I can't find any articles or
examples that are directly on point.

Suppose there are 2 ordinal logistic regression models, and one wants
to set them into a simultaneous equation framework.  Y1 might be a 4
category scale about how much the respondent likes the American Flag
and Y2 might be how much the respondent likes the Republican Party in
America.

By the usual simultaneous equation argument, one should not simply run 2 polr

polr (Y1 ~ Y2 + X1 +X2)

and

polr(Y2 ~ Y1 + X1 + X2)

because Y1 and Y2 are endogenous.  Where does the problem arise?
Thinking back to the theoretical model, there are unmeasured scale
variables y1* and y2* that are determined by

y1* = b0 + b1 * y2 + b2 * X1 + b3 * X2 + e1
and

y2* = c0 + c1 * y1 + c2 * X1 + c3 * X2 + e2

y1* and y2* are not observed, we see only the categorical outputs Y1
and Y2 that correspond to

Y1 =   0    if y1* < pi1
Y1 =   1    if   pi1 <= y1* < pi2
Y1 =   2   if    pi2 <= y1* < pi3
Y1 =   3   if    pi3 <= y1*

and similarly for Y2.

Since e1 is "going into" y1*, and y1* "goes into" y2*,  then there is
the good chance that the error term e1 is correlated with y2*.

Running

polr (Y1 ~ Y2 + X1 +X2)

in isolation might give badly biased estimates.

I have found a well developed literature that deals with the question
when one of the Y's is dichotomous.

Rivers, Douglas and Quang H. Vuong. 1988. Limited Information
Estimators and Exogeneity Tests for Simultaneous Probit Models.
Journal of Econometrics 39: 347-366

Alvarez, R. Michael and Garrett Glasgow. 1999. Two-Stage Estimation of
Nonrecursive Choice Models. Political Analysis. 8: 11:24.

I have not found anybody who has estimated one of these models with R,
however, and was hoping to get an example from someone.

I would also like to know if there is likely to be a problem extending
the estimation framework to two multi-category dependent variables.
In particular, I'm curious to know if one estimates a first stage
model of Y1 as in

polr(Y1 ~ X1 + X2 + Z1)

to estimate predicted values of y1*, (y1*-hat, the linear predictor's
estimated value, I believe), what would be the properties second stage
parameter estimates of the regression that uses the instrumental
variable

polr(Y2 ~ y1*-hat  + X1 + X2)

As far as I can tell, this instrumental variables approach is the only
realistic way to do this.

I am aware of some articles that claim that a multi-category logistic
regression will essentially boil down to a series of dichotomous
logits, in the sense that the dependent variable can be thought of as
a sequence "are you in group 0 or group 1" "are you in group 1 or
group 2" and so forth.

Cole, Stephen R, Paul D. Allison, and Cande V. Ananth.  2004.
Estimation of Cumulative Odds Ratios. AEP 14(3): 172-178.  (AEP =
Annals of Epidemiology)

Following that approach, one could convert the data into the
cumulative logistic format and then proceed with the methods proposed
for binary dependent variables.  I'm cautious about that approach
because the results are not equivalent to maximum likelihood as would
be obtained from polr, for example, and I don't quite see the strength
of building on that approach.