[R] Question on approximations of full logistic regression model
khosoda at med.kobe-u.ac.jp
khosoda at med.kobe-u.ac.jp
Mon May 16 16:19:39 CEST 2011
Thank you for your comment, Prof. Harrell.
I would appreciate it very much if you could teach me how to simulate
for the estimation. For reference, following codes are what I did
(bootcov, summary, and validation).
MyFullModel.boot <- bootcov(MyFullModel, B=1000, coef.reps=T)
> summary(MyFullModel, stenosis=c(70, 80), x1=c(1.5, 2.0), x2=c(1.5, 2.0))
Effects Response : outcome
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
stenosis 70.0 80 10.0 -0.11 0.24 -0.59 0.37
Odds Ratio 70.0 80 10.0 0.90 NA 0.56 1.45
x1 1.5 2 0.5 1.21 0.37 0.49 1.94
Odds Ratio 1.5 2 0.5 3.36 NA 1.63 6.95
x2 1.5 2 0.5 -0.29 0.19 -0.65 0.08
Odds Ratio 1.5 2 0.5 0.75 NA 0.52 1.08
ClinicalScore 3.0 5 2.0 0.61 0.38 -0.14 1.36
Odds Ratio 3.0 5 2.0 1.84 NA 0.87 3.89
procedure - CA:CE 2.0 1 NA 0.83 0.46 -0.07 1.72
Odds Ratio 2.0 1 NA 2.28 NA 0.93 5.59
> summary(MyFullModel.boot, stenosis=c(70, 80), x1=c(1.5, 2.0),
x2=c(1.5, 2.0))
Effects Response : outcome
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
stenosis 70.0 80 10.0 -0.11 0.28 -0.65 0.43
Odds Ratio 70.0 80 10.0 0.90 NA 0.52 1.54
x1 1.5 2 0.5 1.21 0.29 0.65 1.77
Odds Ratio 1.5 2 0.5 3.36 NA 1.92 5.89
x2 1.5 2 0.5 -0.29 0.16 -0.59 0.02
Odds Ratio 1.5 2 0.5 0.75 NA 0.55 1.02
ClinicalScore 3.0 5 2.0 0.61 0.45 -0.28 1.50
Odds Ratio 3.0 5 2.0 1.84 NA 0.76 4.47
procedure - CAS:CEA 2.0 1 NA 0.83 0.38 0.07 1.58
Odds Ratio 2.0 1 NA 2.28 NA 1.08 4.85
> validate(MyFullModel, bw=F, B=1000)
index.orig training test optimism index.corrected n
Dxy 0.6425 0.7054 0.6122 0.0932 0.5493 1000
R2 0.3270 0.3745 0.3330 0.0415 0.2855 1000
Intercept 0.0000 0.0000 0.0683 -0.0683 0.0683 1000
Slope 1.0000 1.0000 1.0465 -0.0465 1.0465 1000
Emax 0.0000 0.0000 0.0221 0.0221 0.0221 1000
D 0.2715 0.2795 0.2424 0.0371 0.2345 1000
U -0.0192 -0.0192 -0.0035 -0.0157 -0.0035 1000
Q 0.2908 0.2987 0.2460 0.0528 0.2380 1000
B 0.1265 0.1164 0.1332 -0.0168 0.1433 1000
g 1.3366 1.5041 1.5495 -0.0455 1.3821 1000
gp 0.2082 0.2172 0.2258 -0.0087 0.2169 1000
> validate(MyFullModel.boot, bw=F, B=1000)
index.orig training test optimism index.corrected n
Dxy 0.6425 0.7015 0.6139 0.0877 0.5549 1000
R2 0.3270 0.3738 0.3346 0.0392 0.2878 1000
Intercept 0.0000 0.0000 0.0613 -0.0613 0.0613 1000
Slope 1.0000 1.0000 1.0569 -0.0569 1.0569 1000
Emax 0.0000 0.0000 0.0226 0.0226 0.0226 1000
D 0.2715 0.2805 0.2438 0.0367 0.2348 1000
U -0.0192 -0.0192 -0.0039 -0.0153 -0.0039 1000
Q 0.2908 0.2997 0.2477 0.0521 0.2387 1000
B 0.1265 0.1177 0.1329 -0.0153 0.1417 1000
g 1.3366 1.5020 1.5523 -0.0503 1.3869 1000
gp 0.2082 0.2191 0.2263 -0.0072 0.2154 1000
(11/05/16 22:01), Frank Harrell wrote:
> The choice is not clear, and requires some simulations to estimate the
> average absolute error of the covariance matrix estimators.
> Frank
>
>
> 細田弘吉 wrote:
>>
>> Thank you for your reply, Prof. Harrell.
>>
>> I agree with you. Dropping only one variable does not actually help a lot.
>>
>> I have one more question.
>> During analysis of this model I found that the confidence
>> intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>> function in rms package) was narrower than CIs provided by usual
>> variance-covariance matrix and CIs of other coefficients wider. My data
>> has no cluster structure. I am wondering which CIs are better.
>> I guess bootstrapping one, but is it right?
>>
>> I would appreciate your help in advance.
>> --
>> KH
>>
>>
>>
>> (11/05/16 12:25), Frank Harrell wrote:
>>> I think you are doing this correctly except for one thing. The
>>> validation
>>> and other inferential calculations should be done on the full model. Use
>>> the approximate model to get a simpler nomogram but not to get standard
>>> errors. With only dropping one variable you might consider just running
>>> the
>>> nomogram on the entire model.
>>> Frank
>>>
>>>
>>> KH wrote:
>>>>
>>>> Hi,
>>>> I am trying to construct a logistic regression model from my data (104
>>>> patients and 25 events). I build a full model consisting of five
>>>> predictors with the use of penalization by rms package (lrm, pentrace
>>>> etc) because of events per variable issue. Then, I tried to approximate
>>>> the full model by step-down technique predicting L from all of the
>>>> componet variables using ordinary least squares (ols in rms package) as
>>>> the followings. I would like to know whether I am doing right or not.
>>>>
>>>>> library(rms)
>>>>> plogit<- predict(full.model)
>>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure,
>>>>> sigma=1)
>>>>> fastbw(full.ols, aics=1e10)
>>>>
>>>> Deleted Chi-Sq d.f. P Residual d.f. P AIC R2
>>>> stenosis 1.41 1 0.2354 1.41 1 0.2354 -0.59 0.991
>>>> x2 16.78 1 0.0000 18.19 2 0.0001 14.19 0.882
>>>> procedure 26.12 1 0.0000 44.31 3 0.0000 38.31 0.711
>>>> ClinicalScore 25.75 1 0.0000 70.06 4 0.0000 62.06 0.544
>>>> x1 83.42 1 0.0000 153.49 5 0.0000 143.49 0.000
>>>>
>>>> Then, fitted an approximation to the full model using most imprtant
>>>> variable (R^2 for predictions from the reduced model against the
>>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>>
>>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>>> full.ols.approx$stats
>>>> n Model L.R. d.f. R2 g Sigma
>>>> 104.0000000 487.9006640 4.0000000 0.9908257 1.3341718 0.1192622
>>>>
>>>> This approximate model had R^2 against the full model of 0.99.
>>>> Therefore, I updated the original full logistic model dropping
>>>> "stenosis" as predictor.
>>>>
>>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>>
>>>>> validate(full.model, bw=F, B=1000)
>>>> index.orig training test optimism index.corrected n
>>>> Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000
>>>> R2 0.3270 0.3716 0.3335 0.0382 0.2888 1000
>>>> Intercept 0.0000 0.0000 0.0821 -0.0821 0.0821 1000
>>>> Slope 1.0000 1.0000 1.0548 -0.0548 1.0548 1000
>>>> Emax 0.0000 0.0000 0.0263 0.0263 0.0263 1000
>>>>
>>>>> validate(full.approx.lrm, bw=F, B=1000)
>>>> index.orig training test optimism index.corrected n
>>>> Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000
>>>> R2 0.3245 0.3592 0.3428 0.0164 0.3081 1000
>>>> Intercept 0.0000 0.0000 0.1281 -0.1281 0.1281 1000
>>>> Slope 1.0000 1.0000 1.1104 -0.1104 1.1104 1000
>>>> Emax 0.0000 0.0000 0.0444 0.0444 0.0444 1000
>>>>
>>>> Validatin revealed this approximation was not bad.
>>>> Then, I made a nomogram.
>>>>
>>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.approx.lrm.nom)
>>>>
>>>> Another nomogram using ols model,
>>>>
>>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.ols.approx.nom)
>>>>
>>>> These two nomograms are very similar but a little bit different.
>>>>
>>>> My questions are;
>>>>
>>>> 1. Am I doing right?
>>>>
>>>> 2. Which nomogram is correct
>>>>
>>>> I would appreciate your help in advance.
>>>>
>>>> --
>>>> KH
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> -----
>>> Frank Harrell
>>> Department of Biostatistics, Vanderbilt University
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> E-mail address
>> Office: khosoda at med.kobe-u.ac.jp
>> Home : khosoda at venus.dti.ne.jp
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3526155.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
*************************************************
神戸大学大学院医学研究科 脳神経外科学分野
細田 弘吉
〒650-0017 神戸市中央区楠町7丁目5-1
Phone: 078-382-5966
Fax : 078-382-5979
E-mail address
Office: khosoda at med.kobe-u.ac.jp
Home : khosoda at venus.dti.ne.jp
More information about the R-help
mailing list