[R] GAM: Overfitting
    Simon Wood 
    simon at stats.gla.ac.uk
       
    Wed Dec 22 12:08:17 CET 2004
    
    
  
> I am analyzing particulate matter data (PM10) on a small data set (147
> observations).  I fitted a semi-parametric model and am worried about
> overfitting.  How can one check for model fit in GAM?
- Keeping a random subset of the data as a validation set,  fitting 
to the remaining data and then comparing the R^2/ proportion deviance explained 
on fit set and validation set is usually quite diagnostic. If the fit data 
are much better predicted than the validation data, then you probably have 
over-fitting. 
- If your response is treated as Poisson then scale parameter estimates 
<<1 are also diagnostic, but only if you are not expecting overdispersion, 
of course. 
- If you use gam from package mgcv then, by default, model 
effective degrees of freedom are estimated from your data by GCV or an 
approximation to AIC. mgcv::gam allows you to increase the penalty on each 
model degree of freedom in these criteria, via gam argument `gamma'. Some 
work by Kim and Gu (2004, J.Roy.Statist.Soc.B) suggests that gamma around 
1.4 can be a sensible choise for surpressing overfitting, without 
much of a degredation in MSE performance.
 
best,
Simon
    
    
More information about the R-help
mailing list