[R] Overfitting/Calibration plots (Statistics question)

Mark Seeto markseeto at gmail.com
Fri Apr 9 02:48:32 CEST 2010


Thank you very much for your help, Prof. Harrell. I was making the bad
mistake of judging the appearance of the calibration plots without
actually calculating the regression line. I was misjudging slopes of
0.8 or 0.9 as being slopes greater than 1.

Kind regards,
Mark Seeto


> Mark,
>
> Try
>
> set.seed(1)
> slope1 <- slope2 <- numeric(100)
>
> for(i in 1:100) {
> x1 <- rnorm(200, 0, 1)
> x2 <- rnorm(200, 0, 1)
> x3 <- rnorm(200, 0, 1)
> x4 <- rnorm(200, 0, 1)
> x5 <- rnorm(200, 0, 1)
> x6 <- rnorm(200, 0, 1)
> y <- x1 + x2 + rnorm(200, 0, 2)
> d <- data.frame(y, x1, x2, x3, x4, x5, x6)
>
> lm1 <- lm(y ~ ., data = d[1:100,])
> lm2 <- lm(y ~ x1 + x2, data = d[1:100,])
>
> slope1[i] <- coef(lsfit(predict(lm1, d[101:200, ]), d$y[101:200]))[2]
> slope2[i] <- coef(lsfit(predict(lm2, d[101:200, ]), d$y[101:200]))[2]
> }
>
> mean(slope1)
> mean(slope2)
>
> I get
>
>> [1] 0.8873878
>> [1] 0.9603158
>
> Frank
>
>
>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                     Department of Biostatistics   Vanderbilt University
>



More information about the R-help mailing list