[R] Question on approximations of full logistic regression model

Wed May 18 15:01:01 CEST 2011

I tried to make a histogram of bootstrap distribution of my logistic 
model according to "Regression Model Strategy" (pp197-200). Attached is 
the histogram I made. The figure demonstrates bootstrap distribution of 
log odds ratio from my logistic model. The solid curve is a kernel 
density estimate and dashed curve is a normal density with the dame mean 
and standard deviation as the bootstrapped values. Vertical lines 
indicate asymmetric 0.9, 0.95, and 0,99 two-sided confidence limits for 
the log odds ratio based on quantiles of the bootstrap values.

It seems to me that bootstrap distribution is normal and that estimation 
of confidence interval is, ummm, accurate.

Am I right?

The codes I used are followings;

 > library(rms)
 > b <- bootcov(MyModel.penalized, B=1000, coef.reps=T)
 > s <- summary(MyModel.penalized, x1=c(1.0, 1.5), x2=c(1.0, 1.5), 
ClinicalScore=c(4,6), procedure=c("E", "A"))
 > X <- predict(MyModel.penalized, data.frame(T1=c(1.0, 1.5), T2=c(1.0, 
1.5), ClinicalScore=c(4,6), procedure=c("E", "A")), type="x")
 > X
   Intercept  x1  x2  ClinicalScore   procedure=E
1         1 1.0 1.0              4             1
2         1 1.5 1.5              6             0
 > Xdif <- X[2, drop=F] -X[1, drop=F]
 > Xdif
   Intercept  x1  x2  ClinicalScore   procedure=E
2         0 0.5 0.5              2            -1
 > conf <- c(.9, .95, .99)
 > bootplot(b, X=Xdif, conf.int=conf, xlim=c(0, 6))

 > boot.log.odds.ratio <- b$boot.Coef %*% t(Xdif)
 > sd <- sqrt(var(boot.log.odds.ratio))
 > sd
           2
2 0.7412509
 > z <- seq(0, 6, length=104)
 > lines(z, dnorm(z, mean=mean(boot.log.odds.ratio), sd = sd), lty=2)

(11/05/16 22:01), Frank Harrell wrote:
> The choice is not clear, and requires some simulations to estimate the
> average absolute error of the covariance matrix estimators.
> Frank
>
>
> 細田弘吉 wrote:
>>
>> Thank you for your reply, Prof. Harrell.
>>
>> I agree with you. Dropping only one variable does not actually help a lot.
>>
>> I have one more question.
>> During analysis of this model I found that the confidence
>> intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>> function in rms package) was narrower than CIs provided by usual
>> variance-covariance matrix and CIs of other coefficients wider.  My data
>> has no cluster structure. I am wondering which CIs are better.
>> I guess bootstrapping one, but is it right?
>>
>> I would appreciate your help in advance.
>> --
>> KH
>>
>>
>>
>> (11/05/16 12:25), Frank Harrell wrote:
>>> I think you are doing this correctly except for one thing.  The
>>> validation
>>> and other inferential calculations should be done on the full model.  Use
>>> the approximate model to get a simpler nomogram but not to get standard
>>> errors.  With only dropping one variable you might consider just running
>>> the
>>> nomogram on the entire model.
>>> Frank
>>>
>>>
>>> KH wrote:
>>>>
>>>> Hi,
>>>> I am trying to construct a logistic regression model from my data (104
>>>> patients and 25 events). I build a full model consisting of five
>>>> predictors with the use of penalization by rms package (lrm, pentrace
>>>> etc) because of events per variable issue. Then, I tried to approximate
>>>> the full model by step-down technique predicting L from all of the
>>>> componet variables using ordinary least squares (ols in rms package) as
>>>> the followings. I would like to know whether I am doing right or not.
>>>>
>>>>> library(rms)
>>>>> plogit<- predict(full.model)
>>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure,
>>>>> sigma=1)
>>>>> fastbw(full.ols, aics=1e10)
>>>>
>>>>    Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>>>    stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>>>    x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>>>    procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>>>    ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>>>    x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>>>
>>>> Then, fitted an approximation to the full model using most imprtant
>>>> variable (R^2 for predictions from the reduced model against the
>>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>>
>>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>>> full.ols.approx$stats
>>>>             n  Model L.R.        d.f.          R2           g       Sigma
>>>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>>>
>>>> This approximate model had R^2 against the full model of 0.99.
>>>> Therefore, I updated the original full logistic model dropping
>>>> "stenosis" as predictor.
>>>>
>>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>>
>>>>> validate(full.model, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>>>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>>>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>>>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>>>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>>>
>>>>> validate(full.approx.lrm, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>>>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>>>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>>>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>>>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>>>
>>>> Validatin revealed this approximation was not bad.
>>>> Then, I made a nomogram.
>>>>
>>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.approx.lrm.nom)
>>>>
>>>> Another nomogram using ols model,
>>>>
>>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.ols.approx.nom)
>>>>
>>>> These two nomograms are very similar but a little bit different.
>>>>
>>>> My questions are;
>>>>
>>>> 1. Am I doing right?
>>>>
>>>> 2. Which nomogram is correct
>>>>
>>>> I would appreciate your help in advance.
>>>>
>>>> --
>>>> KH
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x5factor_final-CI-histogram.pdf
Type: application/pdf
Size: 103774 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110518/00b34a96/attachment.pdf>