[R] Adapting thresholds for predictions of ordinal logistic regression
Christof Bigler
christof.bigler at colorado.edu
Wed Mar 24 23:49:06 CET 2004
I'm dealing with a classification problem using ordinal logistic
regression. In the case of binary logistic regression with unequal
proportions of 0's and 1's, a threshold in the interval [0,1] has to be
adapted to transform back the predicted probabilities into 0 and 1.
This can be done quite straightforward using e.g. the Kappa statistics
as accuracy criterion.
With ordinal logistic regression this seems to be more cumbersome,
since several thresholds have to be adapted. Here, the Gamma statistics
could be used as accuracy criterion.
Below is an example showing individual response probabilities when you
have equal and unequal proportions of four response categories. In the
case of equal proportions (upper panel), one would reasonably assign
the category with the highest probability. However, using the highest
probability for unequal proportions (lower panel) would result in too
many observations of class 2 being predicted as class 1.
Is there any objective way to select the thresholds for assigning the
categories in the case of unequal proportions?
Thanks for your help!
Christof
## R code
library(Design)
# Data set with equal proportions
df1 <-
cbind.data.frame(y=factor(c(rep(1,50),rep(2,50),rep(3,50),rep(4,50))))
df1$x <-
c(rnorm(50,50,30),rnorm(50,100,30),rnorm(50,150,30),rnorm(50,200,30))
# Data set with unequal proportions
df2 <-
cbind.data.frame(y=factor(c(rep(1,200),rep(2,50),rep(3,30),rep(4,20))))
df2$x <-
c(rnorm(200,50,30),rnorm(50,100,30),rnorm(30,150,30),rnorm(20,200,30))
# Fitting ordinal logistic regression models (proportional odds)
f1 <- lrm(y ~ x, data=df1, x=TRUE, y=TRUE)
f2 <- lrm(y ~ x, data=df2, x=TRUE, y=TRUE)
# Individual response probabilities
f.seq <- seq(-50,300)
f1.pred <- predict.lrm(f1,newdata=f.seq,type="fitted.ind")
f2.pred <- predict.lrm(f2,newdata=f.seq,type="fitted.ind")
par(mfrow=c(2,1))
# First plot (equal proportions)
plot(f.seq,
f1.pred[,1],ylim=c(0,1),type="l",xlab="x",ylab="Pr(Y=j)",xlim=c(
-50,300))
lines(f.seq,f1.pred[,2],col="red")
lines(f.seq,f1.pred[,3],col="blue")
lines(f.seq,f1.pred[,4],col="green")
abline(v=c(50,100,150,200),lty=3)
par(new=T)
plot(df1$x,df1$y,xlab="",ylab="",axes=F,bty="n",xlim=c(-50,300))
axis(4,at=pretty(range(as.numeric(df1$y))))
# Second plot (unequal proportions)
plot(f.seq,
f2.pred[,1],ylim=c(0,1),type="l",xlab="x",ylab="Pr(Y=j)",xlim=c(
-50,300))
lines(f.seq,f2.pred[,2],col="red")
lines(f.seq,f2.pred[,3],col="blue")
lines(f.seq,f2.pred[,4],col="green")
abline(v=c(50,100,150,200),lty=3)
par(new=T)
plot(df2$x,df2$y,xlab="",ylab="",axes=F,bty="n",xlim=c(-50,300))
axis(4,at=pretty(range(as.numeric(df2$y))))
More information about the R-help
mailing list