[R] gam

Simon Wood sw283 at maths.bath.ac.uk
Thu Jan 26 11:15:10 CET 2006

> I'm new to both R and to this list and would like to get
> advice on how to build generalized additive models in R.
> Based on the description of gam, which I found on the R
> website, I specified the following model:
> model1<-gam(ST~s(MOWST1),family=binomial,data=strikes.S),
> in which ST is my binary response variable and MOWST1 is a
> categorical independent variable.
> I get the following error message:
> Error in smooth.construct.tp.smooth.spec(object, data,
> knots) :
>          NA/NaN/Inf in foreign function call (arg 1)

- I guess this should maybe get trapped a bit earlier, so that you get
a more informative warning.

- The basic problem is that gams are based around sums of smooth functions
of covariates. For the notion of smooth to be meaningful the covariates
have to live in a space where you have at least a notion of distance
between the covariates, since in some loose sense `smooth' means that
f(x_1) must be close to f(x_2) if x_1 and x_2 are close. For factors you
doen't generally have any notion of distance between the levels of a
factor. (e.g. if a factor has levels "brick", "sky" and "purple", how far
is it from "brick" to "purple"?)

- Even if a factor is naturally ordered (e.g. "small", "medium", "large"),
you would still have to decide on how to measure smoothness/wiggliness of
a function of the factor. For this reason, I think that it is actually
better to explicitly convert levels of an ordered factor into numeric
values on a scale that you think is appropriate, before using the ordered
factor as the covariate in a gam. In this way it's usually fairly easy to
get one of the mgcv built in smoother classes to use the notion of
smoothness that you think is appropriate: if not then it's not too hard to
add a smoother class, following the template provided in ?p.spline
(actually you could use this template to write a smoother class for
ordered catagorical predictors).


More information about the R-help mailing list