[R] gam() (in mgcv) with multiple interactions

Simon Wood s.wood at bath.ac.uk
Thu Jun 9 17:35:11 CEST 2011


I think that the main problem here is that smooths are not constrained 
to pass through the origin, so the covariate taking the value zero 
doesn't correspond to no effect in the way that you would like it to. 
Another way of putting this is that smooths are translation invariant, 
you get essentially the same inference from the model y_i = f(x_i) + e_i 
as from y_i = f(x_i + k) + e_i (which implies that x_i=0 can have no 
special status).

All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
te(a, d) is to remove the bases for te(a), te(b) and te(d) from the 
basis of te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) 
so that te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 
might not even be in the range of a and b).

In general I find functional ANOVA not entirely intuitive to think 
about, but there is a very good book on it by Chong Gu (Smoothing spline 
ANOVA, 2002, Springer), and the associated package gss is on CRAN.

best,
Simon



On 07/06/11 17:00, Ben Haller wrote:
> Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as
> recommended to me earlier by some folks on this list.  I've run into
> a question to which I can't find the answer in his book, so I'm
> hoping somebody here knows.
>
> My outcome variable is binary, so I'm doing a binomial fit with
> gam().  I have five independent variables, all continuous, all
> uniformly distributed in [0, 1].  (This dataset is the result of a
> simulation model.)  Let's call them a,b,c,d,e for simplicity.  I'm
> interested in interactions such as a*b, so I'm using tensor product
> smooths such as te(a,b).  So far so good.  But I'm also interested
> in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these
> have a as a marginal basis (if I'm using the right terminology; all I
> mean is, both interactions involve a), and I would have expected them
> to share that basis; I would have expected them to be constrained
> such that the effect of a when b=0, for one, would be the same as the
> effect of a when d=0, for the other.  This would be just as, in a GLM
> with formula a*b + a*d, that formula would expand to a + b + d + a:b
> + a:d, and there is only one "a"; a doesn't get to be different for
> the a*b interaction than it is for the! a*d interaction.  But with
> tensor product smooths in gam(), that does not seem to be the case.
> I'm still just getting to know mgcv and experimenting with things, so
> I may be doing something wrong; but the plots I have done of fits of
> this type appear to show different marginal effects.
>
> I tried explicitly including terms for the marginal basis; in my
> example, I tried a formula like te(a) + te(b) + te(d) + te(a, b) +
> te(a, d).  No dice; in this case, the main effect of a is different
> between all three places where it occurs in the model.  I.e. te(a)
> shows a different effect of a than te(a, b) shows at b=0, which is
> again different from the effect shown by te(a, d) at d=0.  I don't
> even know what that could possibly mean; it seems wrong to me that
> this could even be the case, but what do I know.  :->
>
> I could move up to a higher-order tensor like te(a,b,d), but there
> are three problems with that.  One, the b:d interaction (in my
> simplified example) is then also part of the model, and I'm not
> interested in it.  Two, given the set of interactions that I *am*
> interested in, I would actually be forced to do the full five-way
> te(a,b,c,d,e), and with a 300,000 row dataset, I shudder to think how
> long that will take to run, since it would have something like 5^5
> free parameters to fit; that doesn't seem worth pursuing.  And three,
> interpretation of a five-way interaction would be unpleasant, to say
> the least; I'd much rather be able to stay with just the two-way (and
> one three-way) interactions that I know are of interest (I know this
> from previous logistic regression modelling of the dataset).
>
> For those who like to see the actual R code, here are two fits I've
> tried:
>
> gam(outcome ~ te(acl, dispersal) + te(amplitude, dispersal) +
> te(slope, curvature, amplitude), family=binomial, data=rla,
> method="REML")
>
> gam(outcome ~ te(slope) + te(curvature) + te(amplitude) + te(acl) +
> te(dispersal) + te(slope, curvature) + te(slope, amplitude) +
> te(curvature, amplitude) + te(acl, dispersal) + te(amplitude,
> dispersal) + te(slope, curvature, amplitude), family=binomial,
> data=rla, method="REML")
>
> So.  Any advice?  How can I correctly do a gam() fit involving
> multiple interactions that involve the same independent variable?
>
> Thanks!
>
> Ben Haller McGill University
>
> http://biology.mcgill.ca/grad/ben/
>
> ______________________________________________ R-help at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



More information about the R-help mailing list