[R] Predicting ordinal outcomes using lrm{Design}
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Apr 16 00:49:45 CEST 2008
jayhegde wrote:
> Dear List,
> I have two questions about how to do predictions using lrm, specifically
> how to predict the ordinal response for each observation *individually*.
> I'm very new to cumulative odds models, so my apologies if my questions are
> too basic.
>
> I have a dataset with 4000 observations. Each observation consists of
> an ordinal outcome y (i.e., rating of a stimulus with four possible ratings,
> 1 through 4), and the values of two predictor variables x1 and x2 associated
> with each stimulus:
>
> ---------------------------------------
> Obs# y x1 x2
> ---------------------------------------
> 1 3 2.35 -1.07
> 2 2 1.78 -0.66
> 3 4 5.19 -3.51
> ...
> 4000 1 0.63 -0.23
> ---------------------------------------
>
> I get excellent fits using
>
> fit1 <-lrm(y ~ x1+x2, data=my.dataframe1)
>
> Now I want to see how well my model can predict y for a new set of 4000
> observations. I need to predict y for each new observation *individually*.
> I know an expression like
>
> predicted1<-predict(fit1, newdata=my.dataframe2, type=""fitted.ind")
>
> can give *probability* of each of the 4 possible responses for each
> observation. So my questions are
>
> (1) How do I pick the likeliest y (i.e., likeliest of the 4 possible
> ratings) for each given new observation?
>
> (2) Are there good reference that explain the theory behind this type of
> prediction to a beginner like me?
>
> Thank you very much,
> Jay Hegdé
> Univeristy of Minnesota
>
>
>
>
You can easily pick the highest probability category after running
predict(fit, newdataset, type='fitted.ind') but this will result in an
improper scoring rule (i.e., an accuracy score that is optimized by the
wrong model). I suggest instead computing the Somers Dxy rank
correlation between predicted log odds (for any one intercept, it
doesn't matter which one) and the observed ordinal category.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list