[R] Adapting thresholds for predictions of ordinal logistic regression

Wed Mar 24 23:49:06 CET 2004

I'm dealing with a classification problem using ordinal logistic  
regression. In the case of binary logistic regression with unequal  
proportions of 0's and 1's, a threshold in the interval [0,1] has to be  
adapted to transform back the predicted probabilities into 0 and 1.  
This can be done quite straightforward using e.g. the Kappa statistics  
as accuracy criterion.

With ordinal logistic regression this seems to be more cumbersome,  
since several thresholds have to be adapted. Here, the Gamma statistics  
could be used as accuracy criterion.
Below is an example showing individual response probabilities when you  
have equal and unequal proportions of four response categories. In the  
case of equal proportions (upper panel), one would reasonably assign  
the category with the highest probability. However, using the highest  
probability for unequal proportions (lower panel) would result in too  
many observations of class 2 being predicted as class 1.

Is there any objective way to select the thresholds for assigning the  
categories in the case of unequal proportions?
Thanks for your help!

Christof

## R code
library(Design)

# Data set with equal proportions
df1 <-  
cbind.data.frame(y=factor(c(rep(1,50),rep(2,50),rep(3,50),rep(4,50))))
df1$x <-  
c(rnorm(50,50,30),rnorm(50,100,30),rnorm(50,150,30),rnorm(50,200,30))

# Data set with unequal proportions
df2 <-  
cbind.data.frame(y=factor(c(rep(1,200),rep(2,50),rep(3,30),rep(4,20))))
df2$x <-  
c(rnorm(200,50,30),rnorm(50,100,30),rnorm(30,150,30),rnorm(20,200,30))

# Fitting ordinal logistic regression models (proportional odds)
f1 <- lrm(y ~ x, data=df1, x=TRUE, y=TRUE)
f2 <- lrm(y ~ x, data=df2, x=TRUE, y=TRUE)

# Individual response probabilities
f.seq <- seq(-50,300)
f1.pred <- predict.lrm(f1,newdata=f.seq,type="fitted.ind")
f2.pred <- predict.lrm(f2,newdata=f.seq,type="fitted.ind")

par(mfrow=c(2,1))

# First plot (equal proportions)
plot(f.seq,  
f1.pred[,1],ylim=c(0,1),type="l",xlab="x",ylab="Pr(Y=j)",xlim=c( 
-50,300))
lines(f.seq,f1.pred[,2],col="red")
lines(f.seq,f1.pred[,3],col="blue")
lines(f.seq,f1.pred[,4],col="green")
abline(v=c(50,100,150,200),lty=3)
par(new=T)
plot(df1$x,df1$y,xlab="",ylab="",axes=F,bty="n",xlim=c(-50,300))
axis(4,at=pretty(range(as.numeric(df1$y))))

# Second plot (unequal proportions)
plot(f.seq,  
f2.pred[,1],ylim=c(0,1),type="l",xlab="x",ylab="Pr(Y=j)",xlim=c( 
-50,300))
lines(f.seq,f2.pred[,2],col="red")
lines(f.seq,f2.pred[,3],col="blue")
lines(f.seq,f2.pred[,4],col="green")
abline(v=c(50,100,150,200),lty=3)
par(new=T)
plot(df2$x,df2$y,xlab="",ylab="",axes=F,bty="n",xlim=c(-50,300))
axis(4,at=pretty(range(as.numeric(df2$y))))