[R] Difference between R and SAS in Corcordance index in ordinal logistic regression
Frank Harrell
f.harrell at Vanderbilt.Edu
Thu Jan 24 20:09:46 CET 2013
Please define 'mean probabilities'.
To compute the C-index or Dxy you need anything that is monotonically
related to the prediction of interest, including the linear combination
of covariates ignoring all intercepts. In other words you don't need
to go to the trouble of computing probabilities unless you are binning,
as the binning is usually done on a controllable 0-1 scale. When I bin
I just choose the middle intercept, I seem to recall. Also try running
SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer
for C. [I wrote the original algorithm SAS uses for this in the old SAS
PROC LOGIST. Binning was just for speed.]
You might also re-run SAS after negating the response variable.
Frank
blackscorpio wrote
> Dear Dr Harrell,
> Thank you very much for your answer. Actually I also tried to found the C
> index by hand on these data using the mean probabilities and I found
> 0.968, as you just showed.
> I understand now why I had a slight difference with the outpout of lrm. I
> am thus convinced that this result is correct.
>
> I read on the SAS help that the procedure logistic also proceed to some
> binning (BINWIDTH option) :
>
> http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm
>
> But I cannot explain why the difference between the two softwares is that
> huge, especially since the class probabilities are the same.
>
> Do you think it could be due to the fact that mean probabilities are
> computed differently ?
>
> Thank for your help and best regards,
> OC
>
>
>> Date: Thu, 24 Jan 2013 05:28:13 -0800
>> From:
> f.harrell@
>> To:
> r-help@
>> Subject: Re: [R] Difference between R and SAS in Corcordance index in
>> ordinal logistic regression
>>
>> lrm does some binning to make the calculations faster. The exact
>> calculation
>> is obtained by running
>>
>> f <- lrm(...)
>> rcorr.cens(predict(f), DA), which results in:
>>
>> C Index Dxy S.D. n
>> missing
>> 0.96814404 0.93628809 0.03808336 32.00000000
>> 0.00000000
>> uncensored Relevant Pairs Concordant Uncertain
>> 32.00000000 722.00000000 699.00000000 0.00000000
>>
>> I.e., C=68 instead of .963. But this is even farther away than the
>> value
>> from SAS you reported.
>>
>> If you don't believe the rcorr.cens result, create a tiny example and do
>> the
>> calculations by hand.
>> Frank
>>
>>
>> blackscorpio81 wrote
>> > Dear R users,
>> >
>> > Please allow to me ask for your help.
>> > I am currently using Frank Harrell Jr package "rms" to model ordinal
>> > logistic regression with proportional odds. In order to assess model
>> > predictive ability, C concordance index is displayed and equals to
>> 0.963.
>> >
>> > This is the code I used with the data attached
>> > data.csv <http://r.789695.n4.nabble.com/file/n4656409/data.csv>
>> > :
>> >
>> >>require(rms)
>> >>a<-read.csv2("/data.csv",row.names =,na.strings = c(""," "),dec=".")
>> >>lrm(DA~SJ+TJ,data=
>> >
>> > Logistic Regression Model
>> >
>> > lrm(formula =A~SJ+TJ, data = a)
>> >
>> > Frequencies of Responses
>> >
>> > 1 2 3 4
>> > 6 13 9 4
>> >
>> > Model Likelihood
>> > Discrimination Rank Discrim.
>> > Ratio Test
>> > Indexes Indexes
>> > Obs 32 LR chi2 53.14
>> R2
>> > 0.875 C 0.963
>> > max |deriv| 6e-06 d.f. 2 g
>> > 8.690 Dxy 0.925
>> > Pr(> chi2) <0.0001
>> gr
>> > 5942.469 gamma 0.960
>> >
>> > gp 0.486 tau-a 0.673
>> >
>> > Brier 0.022
>> >
>> > Coef S.E. Wald Z
>> Pr(>|Z|)
>> > y>= -0.6161 0.6715 -0.92 0.3589
>> > y>= -6.5949 2.3750 -2.78 0.0055
>> > y>= -16.2358 5.3737 -3.02 0.0025
>> > SJ 1.4341 0.5180 2.77 0.0056
>> > TJ 0.5312 0.2483 2.14 0.0324
>> >
>> > I wanted to compare the results with SAS. I found the same slopes and
>> > intercept with opposite signs, which is normal since R models the
>> > probabilities P(Y>=X) whereas SAS models the probabilities P(Y<=k|X)
>> > (see pdf attached, page 2 , table "Association des probabilités
>> prédites
>> > et des réponses observées").
>> > SAS_Report_-_Logistic_Regression.pdf
>> >
>> <http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf>
>> >
>> > I chose the order for levels.
>> >
>> > I controlled that the corresponding probabilities P(Y=X) are the
>> same
>> > with both softwares. But I can't understand why in SAS the C index
>> drops
>> > from 0.963 down to 0.332.
>> >
>> > I read a lot of things about this and it seems to me that both
>> softwares
>> > use slightly different technique to compute the C index ; it is
>> > nevertheless surprising to me to observe such a shift in the results.
>> >
>> > Does anyone have a clue on this ?
>> > Thank you very much for you help
>> > Blackscorpio
>>
>>
>>
>>
More information about the R-help
mailing list