[R-sig-eco] bootstrapping gam models with multiple explanatory terms

Gavin Simpson gavin.simpson at ucl.ac.uk
Fri May 4 14:02:56 CEST 2012


Dear Basil,

You are describing a non-parametric bootstrap procedure. There is
nothing wrong with what you are doing, but the approach you are taking
to extract/plot the smooth terms is a little too simplistic and hence
you've hit a brick wall.

To bootstrap multiple terms, I would have my training data in a data
frame, say `train`, and draw a bootstrap sample from that. I would fit
my model in the loop like this

boot.mod <- gam(y ~ s(x, bs = "cr") + s(z, bs = "cr"),
                data = train[k.star, , drop = FALSE])

notice how I am not fiddling with the model representation just the data
object used to fit the model.

Then I would use `predict(...., type = "terms")`, not `type = "link"`
(the default). `type = "terms"` returns a matrix of contributions of
terms in the model. In the above model you'd have a matrix with 2
columns, one for the smooth on `x` and one on `z`. These are the centred
smooth functions. To relate this to the values you were producing with
`predict(....)` in your example note that

predterms <- predict(boot.mod, newdata, type = "terms")
pred <- attr(predterms, "constant") + rowSums(predterms)

gives object `pred` which should be equivalent to `predict(boot.mod)`.
The constant is the model intercept, which is an attribute of the
returned object.

You could then plot each column of `pred` against the relevant column
from `newdata` to give the bootstrap smooth for each term. I would do
that with `lines()` to add them to the plot.

In addition, note that you could sample form the posterior distribution
of the model to generate something similar; the splines are associated
with parameters and the model estimates for these parameters form a
multivariate normal distribution. You can take random draws from the
distribution to examine the variation in the shapes that could be taken
by the fitted splines given the uncertainty in fitting. Simon Wood has
an example of this in his "GAM: an introduction with R book" and I used
this in a blog post recently:

http://wp.me/pZRQ9-2j

Simon's book also has an example of a parametric bootstrap where the
resampling is done from the model residuals to create new data from the
fitted model and then fit a new model to the new data.

HTH

G

On Thu, 2012-05-03 at 20:36 -0500, Basil Iannone wrote:
> Hello R users,
> 
> I may be thinking about this all wrong. If I am, please let me know.
> 
> My question is about bootstrapping gam models. I want to know if there is a
> way to produce bootstrapped smoothers in a gam model that has more than one
> significant explanatory term. I know how to achieve this when the model
> only has one term. The code that I am using to do so is below. It produces
> a nice graph of y values in response to my explanatory variable (x), along
> with bootstrapped smoothers in a grey color and the smoother produced by
> the gam package in black. In my mind this is a very useful graph to help
> determine the "realness" of the fitted smoother that gam produces. I should
> note a thanks to Charles Geyer who had code on his STATS 5601 website that
> has helped me immensely thus far.
> 
> x <- SI1 ## my explanatory variable
> y <- NO3.sp3 ## my dependent/response variable
> 
> model <- gam(y ~ s(x, bs = "cr")) ## I know I can use other smoother types.
> I am still experimenting with this aspect.
> plot(x, y)
> curve(predict(model, newdata = data.frame(x = x)), add = TRUE)
> n <- length(x)
> nboot <- 100
> for (i in 1:nboot) {
>    k.star <- sample(n, replace = TRUE)
>    x.star <- x[k.star]
>    y.star <- y[k.star]
>    model.star <- gam(y.star ~ s(x.star, bs = "cr"))
>    curve(predict(model.star, data.frame(x.star = x)),
>        add = TRUE, col = "grey")
> }
> points(x, y, pch = 16)
> curve(predict(model, newdata = data.frame(x = x)), add = TRUE, lwd = 2)
> 
> But now lets say I have a model with two explanatory variables ( x and z)
> and for the sake of this discussion, both terms are significant (i.e., the
> data is better explained by having them in the model). The model in this
> case would be:
> 
> new.model <- gam(y ~ s(x, bs = "cr")+ s(z, bs = "cr"))
> 
> I know by using "plot (new.model, resid = T, pch = 16)" I can see the
> smoothers and 95% CI produced by the gam package with the data values
> overlayed.
> 
> Is there a way to produce bootstrapped smoothers for each term from the
> same model (new.model) so that I can visualize them in the same way as the
> plot function allows me? I thought of having separate single-term models
> for each explanatory variable (Y~s(x) and y~s(z)), but doing so is not
> correct since removing one term from a model causes the parameter values of
> the other term to change. Any suggestions on the bootstrapping code that
> would produce such a graphical output would be appreciated, especially
> since I am new to the bootstrapping coding.
> 
> I hope that is clear and thanks in advance for any help that anyone can
> offer.
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list