[R] [Rd] Formulas in gam function of mgcv package

Tue Aug 25 11:00:26 CEST 2009

Dear Gavin / Rlings,

thanks for your kind answer and sorry for posting to the dev mailing list.

Concerning the specific of your answer:

I am working with 6 to 36 covariates, and they are all centred and scaled. I 
represented the problem with two variables to simplify the question.

So ideally, the situation is:

1) y ~ s(x1) + .... + s(x36)

vs.

2) y~s(x1, .... ,x36)

I am trying to build a predictive model. Since the the variables are centred 
and scaled, I think I need an isotropic smooth. I am also interested in having 
the interactions between the variables included, that is not a purely additive 
model.

It is not clear to me when should I give preference to tensor smooths, 
possibly because I have not understood well how they work.

I am reading Wood(2003) as recommended and I have also read rather extensively 
Simon N. Wood. Generalized Additive Models: An Introduction, 2006, but still I 
am stuck. Any additional suggestion or reading recommendation would be greatly 
appreciated.

I have also some difficulties in understanding the values you have chosen for k 
in the first example (why 60?).

Thanks

Best,

On Monday 24 August 2009 17:33:55 Gavin Simpson wrote:
> [Note R-Devel is the wrong list for such questions. R-Help is where this
> should have been directed - redirected there now]
>
> On Mon, 2009-08-24 at 17:02 +0100, Corrado wrote:
> > Dear R-experts,
> >
> > I have a question on the formulas used in the gam function of the mgcv
> > package.
> >
> > I am trying to understand the relationships between:
> >
> > y~s(x1)+s(x2)+s(x3)+s(x4)
> >
> > and
> >
> > y~s(x1,x2,x3,x4)
> >
> > Does the latter contain the former? what about the smoothers of all
> > interaction terms?
>
> I'm not 100% certain how this scales to smooths of more than 2
> variables, but Sections 4.10.2 and 5.2.2 of Simon Wood's book GAM: An
> Introduction with R (2006, Chapman Hall/CRC) discuss this for smooths of
> 2 variables.
>
> Strictly y ~ s(x1) + s(x2) is not nested in y ~ s(x1, x2) as the bases
> used to produce the smoothers in the two models may not be the same in
> both models. One option to ensure nestedness is to fit the more
> complicated model as something like this:
>
> ## if simpler model were: y ~ s(x1, k=20) + s(x2, k = 20)
> y ~ s(x1, k=20) + s(x2, k = 20) + s(x1, x2, k = 60)
>                                   ^^^^^^^^^^^^^^^^^
> where the last term (^^^ above) has the same k as used in s(x1, x2)
>
> Note that these are isotropic smooths; are x1 and x2 measured in the
> same units etc.? Tensor product smooths may be more appropriate if not,
> and if we specify the bases when fitting models s(x1) + s(x2) *is*
> strictly nested in te(x1, x2), eg.
>
> y ~ s(x1, bs = "cr", k = 10) + s(x2, bs = "cr", k = 10)
>
> is strictly nested within
>
> y ~ te(x1, x2, k = 10)
> ## is the same as y ~ te(x1, x2, bs = "cr", k = 10)
>
> [Note that bs = "cr" is the default basis in te() smooths, hence we
> don't need to specify it, and k = 10 refers to each individual smooth in
> the te().]
>
> HTH
>
> G
>
> > I have (tried to) read the manual pages of gam, formula.gam,
> > smooth.terms, linear.functional.terms but could not understand properly.
> >
> > Regards

-- 
Corrado Topi

Global Climate Change & Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct529 at york.ac.uk