[R] How to estimate whether overfitting?

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Mon May 10 04:13:02 CEST 2010


On 05/09/2010 10:53 AM, David Winsemius wrote:
>
> On May 9, 2010, at 9:20 AM, bbslover wrote:
>
>>
>> 1. is there some criterion to estimate overfitting? e.g. R2 and Q2 in the
>> training set, as well as R2 in the test set, when means overfitting. for
>> example, in my data, I have R2=0.94 for the training set and for the test
>> set R2=0.70, is overfitting?
>> 2. in this scatter, can one say this overfitting?
>>
>> 3. my result is obtained by svm, and the sample are 156 and 52 for the
>> training and test sets, and predictors are 96, In this case, can svm be
>> employed to perform prediction? whether the number of the predictors are
>> too many ?

Your test sample is too small by a factor of 100 for split sample 
validation to work well.

Frank

>>
>
> I think you need to buy a copy of Hastie, Tibshirani, and Friedman and
> do some self-study of chapters 7 and 12.
>
>
>> 4.from this picture, can you give me some suggestion to improve model
>> performance? and is the picture bad?
>>
>>
>> 5. the picture and data below.
>> thank you!
>>
>>
>> http://n4.nabble.com/file/n2164417/scatter.jpg scatter.jpg
>>
>> http://n4.nabble.com/file/n2164417/pkc-svm.txt pkc-svm.txt
>> --


-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list