# [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Charlotte Whitham charlotte.whitham at gmail.com
Wed Nov 26 17:55:20 CET 2014

```Dear Rune,

Thank you for your prompt reply and it looks like the ordinal package could be the answer I was looking for!

If you don't mind, I'd also like to know please what to do if the tests show the proportional odds assumption is NOT met. (Unfortunately I notice effects from almost all variables that breach the proportional odds assumption in my dataset)

Would you recommend a multinomial logistic model? Or re-scaling of the data?

Best wishes,

Charlie

On 26 Nov 2014, at 14:08, Rune Haubo <rune.haubo at gmail.com> wrote:

> Dear Charlie,
>
> test for non-proportional odds using the ordinal package (warning:
> self-promotion) using the wine data set also from the ordinal package.
>
> Hope this is something you can use.
> Cheers,
> Rune
>
>> library(ordinal)
>> ## Fit model:
>> fm <- clm(rating ~ temp + contact, data=wine)
>> summary(fm)
> formula: rating ~ temp + contact
> data:    wine
>
> logit flexible  72   -86.49 184.98 6(0)  4.64e-15 2.7e+01
>
> Coefficients:
>           Estimate Std. Error z value Pr(>|z|)
> tempwarm     2.5031     0.5287   4.735 2.19e-06 ***
> contactyes   1.5278     0.4766   3.205  0.00135 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Threshold coefficients:
>    Estimate Std. Error z value
> 1|2  -1.3444     0.5171  -2.600
> 2|3   1.2508     0.4379   2.857
> 3|4   3.4669     0.5978   5.800
> 4|5   5.0064     0.7309   6.850
>> ## Model with non-proportional odds for contact:
>> fm2 <- clm(rating ~ temp, nominal=~contact, data=wine)
>> ## Likelihood ratio test of non-proportional odds:
>> anova(fm, fm2)
> Likelihood ratio tests of cumulative link models:
>
> fm  rating ~ temp + contact ~1       logit flexible
> fm2 rating ~ temp           ~contact logit flexible
>
>    no.par    AIC  logLik LR.stat df Pr(>Chisq)
> fm       6 184.98 -86.492
> fm2      9 190.42 -86.209  0.5667  3      0.904
>> ## Automatic tests of non-proportional odds for all varibles:
>> nominal_test(fm)
> Tests of nominal effects
>
> formula: rating ~ temp + contact
>        Df  logLik    AIC    LRT Pr(>Chi)
> <none>     -86.492 184.98
> temp     3 -84.904 187.81 3.1750   0.3654
> contact  3 -86.209 190.42 0.5667   0.9040
>
> On 25 November 2014 at 17:21, Charlotte Whitham
> <charlotte.whitham at gmail.com> wrote:
>> Dear list,
>>
>> I have used the ‘polr’ function in the MASS package to run an ordinal logistic regression for an ordinal categorical response variable with 15 continuous explanatory variables.
>> I have used the code (shown below) to check that my model meets the proportional odds assumption following advice provided at (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely helpful, thank you to the authors! However, I’m a little worried about the output implying that not only are the coefficients across various cutpoints similar, but they are exactly the same (see graphic below).
>>
>> Here is the code I used (and see attached for the output graphic)
>>
>>
>> b<-polr(FGV1b\$FG1_val_cat ~ FGV1b\$X + FGV1b\$Y + FGV1b\$Slope + FGV1b\$Ele + FGV1b\$Aspect + FGV1b\$Prox_to_for_FG + FGV1b\$Prox_to_for_mL + FGV1b\$Prox_to_nat_border + FGV1b\$Prox_to_village + FGV1b\$Prox_to_roads + FGV1b\$Prox_to_rivers + FGV1b\$Prox_to_waterFG + FGV1b\$Prox_to_watermL + FGV1b\$Prox_to_core + FGV1b\$Prox_to_NR, data = FGV1b, Hess=TRUE)
>>
>> #Checking the assumption. So the following code will estimate the values to be graphed. First it shows us #the logit transformations of the probabilities of being greater than or equal to each value of the target #variable
>>
>> FGV1b\$FG1_val_cat<-as.numeric(FGV1b\$FG1_val_cat)
>>
>> sf <- function(y) {
>>
>>  c('VC>=1' = qlogis(mean(FGV1b\$FG1_val_cat >= 1)),
>>
>>    'VC>=2' = qlogis(mean(FGV1b\$FG1_val_cat >= 2)),
>>
>>    'VC>=3' = qlogis(mean(FGV1b\$FG1_val_cat >= 3)),
>>
>>    'VC>=4' = qlogis(mean(FGV1b\$FG1_val_cat >= 4)),
>>
>>    'VC>=5' = qlogis(mean(FGV1b\$FG1_val_cat >= 5)),
>>
>>    'VC>=6' = qlogis(mean(FGV1b\$FG1_val_cat >= 6)),
>>
>>    'VC>=7' = qlogis(mean(FGV1b\$FG1_val_cat >= 7)),
>>
>>    'VC>=8' = qlogis(mean(FGV1b\$FG1_val_cat >= 8)))
>>
>> }
>>
>>  (t <- with(FGV1b, summary(as.numeric(FGV1b\$FG1_val_cat) ~ FGV1b\$X + FGV1b\$Y + FGV1b\$Slope + FGV1b\$Ele + FGV1b\$Aspect + FGV1b\$Prox_to_for_FG + FGV1b\$Prox_to_for_mL + FGV1b\$Prox_to_nat_border + FGV1b\$Prox_to_village + FGV1b\$Prox_to_roads + FGV1b\$Prox_to_rivers + FGV1b\$Prox_to_waterFG + FGV1b\$Prox_to_watermL + FGV1b\$Prox_to_core + FGV1b\$Prox_to_NR, fun=sf)))
>>
>>
>>
>> #The table displays the (linear) predicted values we would get if we regressed our
>>
>> #dependent variable on our predictor variables one at a time, without the parallel slopes
>>
>> #assumption. So now, we can run a series of binary logistic regressions with varying cutpoints
>>
>> #on the dependent variable to check the equality of coefficients across cutpoints
>>
>> par(mfrow=c(1,1))
>>
>> plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,7:8]))
>>
>>
>>
>> Apologies that I am no statistics expert and perhaps I am missing something obvious here. However, I have spent a long time trying to figure out if there is a problem in how I tested the model assumption and also trying to figure out other ways to run the same kind of model.
>>
>> For example, I read in many help mailing lists that others use the vglm function (in the VGAM package) and the lrm function (in the rms package) (for example see here:  http://stats.stackexchange.com/questions/25988/proportional-odds-assumption-in-ordinal-logistic-regression-in-r-with-the-packag). I have tried to run the same models but am continuously coming up against warnings and errors.
>>
>> For example, when I try to fit the vglm model with the ‘parallel=FALSE’ argument (as the previous link mentions is important for testing the proportional odds assumption), I encounter the following error:
>>
>>
>>
>> Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'
>>
>>
>> In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals = residuals,  :
>>
>>  fitted values close to 0 or 1
>>
>>
>>
>> And after many searches for help, I can’t seem to find a way to fix this problem.
>>
>> I would like to ask please if there is anyone who might understand and be able to explain to me why the graph I produced above looks as it does. If indeed it means that something isn’t right, could you please help me find a way to test the proportional odds assumption when just using the polr function. Or if that is just not possible, then I will resort to trying to use the vglm function, but would then need some help to explain why I keep getting the error given above.
>>
>> I hope this is clear. Please do let me know if I should provide some more information that would help address this query.
>>
>> NOTE: As a background, there are 1000 datapoints here, which are actually location points across a study area. I am looking to see if there are any relationships between the categorical response variable and these 15 explanatory variables. All of those 15 explanatory variables are spatial characteristics (for example, elevation, x-y coordinates, proximity to forest etc.). The 1000 datapoints were randomly allocated using a GIS, but I took a stratified sampling approach. I made sure that 125 points were randomly chosen within each of the 8 different categorical response levels. I hope this information is also helpful.
>>
>> I am extremely grateful to anyone who could please give me some guidance with this.
>>
>> Thank you very much for your time,
>>
>> Charlie
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help