# [R] Ordinal data - Regression Trees & Proportional Odds

Liaw, Andy andy_liaw at merck.com
Thu May 29 14:07:16 CEST 2003

```> From: John Fieberg [mailto:John.Fieberg at dnr.state.mn.us]
>
> I have a data set w/ an ordinal response taking on one of 10
> categories.
>  I am considering using polr to fit a cumulative logits model.  I
> previously fit the model in SAS (using proc logistic) which provides a
> test for the proportional odds assumption (p < 0.001 for the
> test).  Are
> there simple diagnostic plots that can be used to look at the validity
> of this assumption and possibly help w/ modifying the model as
> appropriate?  Any references or examples of useful R code for
> the proportional odds assumption would be much appreciated!
>
> I also used a regression tree approach to explore this data set.  In
> doing so, I treated the response as numeric, using the rpart
> library.  I
> am rather new to regression trees - and wondered about the validity of
> this approach.  I used cross-validation to prune the tree -
> but plots of
> the response clearly indicate that the data are non-normal and don't
> have equal variance (the data are highly skewed towards
> larger response
> categories - values of 8-10).  I have seen some people
> suggest that the
> tree approach is essentially non-parametric - but then I have
> seen other
> references suggesting examination of residual plots and potential
> transformations of the response to ensure homogeneity of
> variance.  For
> this data set, it will be difficult to find an appropriate
> transformation, given the large number of responses near 10 (i.e., the
> fact that the data are constrained to be less than or equal to 10
> results in strange residual plots).

I can't say anything about logistic models, but would like to say a few

AFAIK there's no implementation (or description) of tree algorithm that
handles ordinal response.  We have discussed this with Prof. Breiman some
time last year, and it is not straight forward at all (to us, at least).

Regression trees are non-parametric models in the sense that the regression
functions they estimate can have arbitrary form.  However, the least squares
(or even least absolute value) splitting criterion implicitly assume
homoscedasticity.  As a matter of fact, the CART book (Breiman, Friedman,
Olshen & Stone, 1984) has discussion on the effect of heteroscedasticity on
regression trees.

HTH,
Andy

> Any help is much appreciated!
>
> John Fieberg, Ph.D.
> Wildlife Biometrician, Minnesota DNR