[R] How to avoid overfitting in gam(mgcv)

Ariyo Kanno 10dimensioner at gmail.com
Wed Oct 3 15:06:00 CEST 2007


Thank you for valuable advices.
I'm sorry Dr. N. Wood that by mistake I sent this reply firstly to
your personal e-mail address.

I will use the "min.sp" argument when the data size is very small. I'd
like to know if there is any criteria for selecting "min.sp."

I compared gamma=1.0 and 1.4, and I could see the smoothing effects of
 enhancing gamma by comparing edf and smoothing parameter. But it was
not enough to suppress the overfitting when data size was small.

Here I try to mean by "overfitting" that GCV was significantly larger
than the mean square error of prediction of the validation data, which
was randomly selected and not used for regression.

Best Wishes,
Ariyo

2007/10/3, Simon Wood <s.wood at bath.ac.uk>:
> On Wednesday 03 October 2007 10:49, Ariyo Kanno wrote:
> > I appreciate your quick reply.
> > I am using the model of the following structure :
> >
> > fit <- gam(y~x1+s(x2))
> >
> > ,where y, x1, and x2 are quantitative variables.
> > So the response distribution is assumed to be gaussian(default).
> >
> > Now I understand that the data size was too small.
> -- Well, the 10 end is definitely too small, but you can get quite reasonable
> estimates of a single smoothing parameter from 30+ gaussian data.
> -- You can force smoother models my either setting the smoothing parameter
> yourself using the `sp' argument to `gam', or by using the `min.sp' argument
> to set a lower bound on the smoothing parameter.
> -- I'm suprised that `gamma' had no effect - how high did you try?
>
> best,
> Simon
>
>
>
> > Thank you.
> >
> > Best Wishes,
> >
> > Ariyo
> >
> > 2007/10/3, Simon Wood <s.wood at bath.ac.uk>:
> > > What sort of model structure are you using? In particular what is the
> > > response distribution? For poisson and binomial then overfitting can be a
> > > sign of overdispersion and quasipoisson or quasibinomial may be better.
> > > Also I would not expect to get useful smoothing parameter estimates from
> > > 10 data!
> > >
> > > best,
> > > Simon
> > >
> > > On Wednesday 03 October 2007 06:55, $B?@LnM- at 8(B wrote:
> > > > Dear listers,
> > > >
> > > > I'm using gam(from mgcv) for semi-parametric regression on small and
> > > > noisy datasets(10 to 200
> > > > observations), and facing a problem of overfitting.
> > > >
> > > > According to the book(Simon N. Wood / Generalized Additive Models: An
> > > > Introduction with R), it is
> > > > suggested to avoid overfitting by inflating the effective degrees of
> > > > freedom in GCV evaluation with
> > > > increased "gamma" value(e.g. 1.4). But in my case, it didn't make a
> > > > significant change in the
> > > > results.
> > > >
> > > > The only way I've found to suppress overfitting is to set the basis
> > > > dimension "k" at very low values
> > > > (3 to 5). However, I don't think this is reasonable because knots
> > > > selection will then be an
> > > > important issue.
> > > >
> > > > Is there any other means to avoid overfitting when alalyzing small
> > > > datasets?
> > > >
> > > > Thank you for your help in advance,
> > > > Ariyo Kanno
> > > >
> > > > --
> > > > Ariyo Kanno
> > > > 1st-year doctor's degree student at
> > > > Institute of Environmental Studies,
> > > > The University of Tokyo
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html and provide commented,
> > > > minimal, self-contained, reproducible code.
> > >
> > > --
> > >
> > > > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > > > +44 1225 386603  www.maths.bath.ac.uk/~sw283
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
>
> --
> > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > +44 1225 386603  www.maths.bath.ac.uk/~sw283
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list