[R] How to estimate whether overfitting?
Bert Gunter
gunter.berton at gene.com
Mon May 10 18:05:16 CEST 2010
(Near) non-identifiability (especially in nonlinear models, which include
linear mixed effects models, Bayesian hierarchical models, etc.) is
typically a strong clue; usually indicated by software complaints (e.g.
convergence failures, running up against iteration limits, etc.).
However this is sufficient-ish, not necessary: "over-fitting" frequently
occurs even without such overt complaints. It should also be said that,
except for identifiability, "over-fitting" is not a well-defined
statistical term: it depends on the scientific context.
Bert Gunter
Genentech Nonclinical Biostatistics
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Steve Lianoglou
Sent: Sunday, May 09, 2010 6:13 PM
To: David Winsemius
Cc: r-help at r-project.org; bbslover
Subject: Re: [R] How to estimate whether overfitting?
On Sun, May 9, 2010 at 11:53 AM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> On May 9, 2010, at 9:20 AM, bbslover wrote:
>
>>
>> 1. is there some criterion to estimate overfitting? e.g. R2 and Q2 in
the
>> training set, as well as R2 in the test set, when means overfitting.
for
>> example, in my data, I have R2=0.94 for the training set and for the
>> test
>> set R2=0.70, is overfitting?
>> 2. in this scatter, can one say this overfitting?
>>
>> 3. my result is obtained by svm, and the sample are 156 and 52 for the
>> training and test sets, and predictors are 96, In this case, can svm be
>> employed to perform prediction? whether the number of the predictors
are
>> too many ?
>>
>
> I think you need to buy a copy of Hastie, Tibshirani, and Friedman and do
> some self-study of chapters 7 and 12.
And you don't even have to buy it before you can start studying since
the PDF is available here:
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
Having a hard cover is always handy, tho ..
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list