gam.models {mgcv} | R Documentation |
Specifying generalized additive models
Description
This page is intended to provide some more information on how to specify GAMs. A GAM is a GLM in which the linear predictor depends, in part, on a sum of smooth functions of predictors and (possibly) linear functionals of smooth functions of (possibly dummy) predictors.
Specifically let y_i
denote an independent random variable
with mean \mu_i
and an exponential family distribution, or failing
that a known mean variance relationship suitable for use of quasi-likelihood methods.
Then the the linear predictor of a GAM has a structure something like
g(\mu_i) = {\bf X}_i{\beta} + f_1(x_{1i},x_{2i}) + f_2(x_{3i}) + L_i f_3(x_4) +
\ldots
where g
is a known smooth monotonic ‘link’ function, {\bf X}_i\beta
is the parametric part of the linear predictor, the x_j
are predictor variables,
the f_j
are smooth functions and L_i
is some linear functional of
f_3
. There may of course be multiple linear functional terms, or none.
The key idea here is that the dependence of the response on the predictors can be represented as a parametric sub-model plus the sum of some (functionals of) smooth functions of one or more of the predictor variables. Thus the model is quite flexible relative to strictly parametric linear or generalized linear models, but still has much more structure than the completely general model that says that the response is just some smooth function of all the covariates.
Note one important point. In order for the model to be identifiable
the smooth functions usually have to be constrained to have zero mean (usually
taken over the set of covariate values). The constraint is needed if the term involving the
smooth includes a constant function in its span. gam
always applies such constraints
unless there is a by
variable present, in which case an assessment is made of whether
the constraint is needed or not (see below).
The following sections discuss specifying model structures for gam
.
Specification of the distribution and link function is done using the family
argument to gam
and works in the same way as for glm
.
This page therefore concentrates on the model formula for gam
.
Models with simple smooth terms
Consider the example model.
g(\mu_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + f_1(x_{3i}) +
f_2(x_{4i},x_{5i})
where the response variables y_i
has expectation \mu_i
and g
is a link function.
The gam
formula for this would be
y ~ x1 + x2 + s(x3) + s(x4,x5)
.
This would use the default basis for the smooths (a thin plate
regression spline basis for each), with automatic selection of the
effective degrees of freedom for both smooths. The dimension of the
smoothing basis is given a default value as well (the dimension of the
basis sets an upper limit on the maximum possible degrees of
freedom for the basis - the limit is typically one less than basis
dimension). Full details of how to control smooths are given in
s
and te
, and further discussion of basis
dimension choice can be found in choose.k
.
For the moment suppose that we would like to change
the basis of the first smooth to a cubic regression spline basis with
a dimension of 20, while fixing the second term at 25 degrees of
freedom. The appropriate formula would be:
y ~ x1 + x2 + s(x3,bs="cr",k=20) + s(x4,x5,k=26,fx=TRUE)
.
The above assumes that x_{4}
and x_5
are naturally on
similar scales (e.g. they might be co-ordinates), so that isotropic smoothing
is appropriate. If this assumption is false then tensor product smoothing might be
better (see te
).
y ~ x1 + x2 + s(x3) + te(x4,x5)
would generate a tensor product smooth of x_{4}
and x_5
.
By default this smooth would have basis dimension 25 and use cubic regression spline marginals.
Varying the defaults is easy. For example
y ~ x1 + x2 + s(x3) + te(x4,x5,bs=c("cr","ps"),k=c(6,7))
specifies that the tensor product should use a rank 6 cubic regression spline marginal
and a rank 7 P-spline marginal to create a smooth with basis dimension 42.
Nested terms/functional ANOVA
Sometimes it is interesting to specify smooth models with a main effects + interaction structure such as
E(y_i) = f_1(x_i) + f_2(z_i) + f_3(x_i,z_i)
or
E(y_i)=f_1(x_i) + f_2(z_i) + f_3(v_i)
+ f_4(x_i,z_i) + f_5(z_i,v_i) + f_6(z_i,v_i) + f_7(x_i,z_i,v_i)
for example. Such models should be set up using ti
terms in the model formula. For example:
y ~ ti(x) + ti(z) + ti(x,z)
, or
y ~ ti(x) + ti(z) + ti(v) + ti(x,z) + ti(x,v) + ti(z,v)+ti(x,z,v)
.
The ti
terms produce interactions with the component main effects excluded appropriately. (There is in fact no need to use ti
terms for the main effects here, s
terms could also be used.)
gam
allows nesting (or ‘overlap’) of te
and s
smooths, and automatically generates side conditions to
make such models identifiable, but the resulting models are much less stable and interpretable than those constructed using ti
terms.
‘by’ variables
by
variables are the means for constructing ‘varying-coefficient models’ (geographic regression models) and
for letting smooths ‘interact’ with factors or parametric terms. They are also the key to specifying general linear
functionals of smooths.
The s
and te
terms used to specify smooths accept an argument by
,
which is a numeric or factor variable of the same dimension as the covariates of the smooth.
If a by
variable is numeric, then its i^{th}
element multiples the i^{th}
row of the model matrix corresponding to the smooth term concerned.
Factor smooth interactions (see also factor.smooth.interaction
).
If a by
variable is a factor
then it
generates an indicator vector for each level
of the factor, unless it is an ordered
factor.
In the non-ordered case, the model matrix for the smooth term is then replicated for each factor level,
and each copy has its rows multiplied by the corresponding rows of its
indicator variable. The smoothness penalties are also duplicated for each
factor level. In short a different smooth is generated
for each factor level (the id
argument to s
and te
can be used to force all
such smooths to have the same smoothing parameter). ordered
by
variables are handled in the same
way, except that no smooth is generated for the first level of the ordered factor (see b3
example below).
This is useful for setting up
identifiable models when the same smooth occurs more than once in a model, with different factor by
variables.
As an example, consider the model
E(y_i) = \beta_0+ f(x_i)z_i
where f
is a smooth function, and z_i
is a numeric variable.
The appropriate formula is:
y ~ s(x,by=z)
- the by
argument ensures that the smooth function gets multiplied by
covariate z
. Note that when using factor by variables, centering constraints are applied to the smooths,
which usually means that the by variable should be included as a parametric term, as well.
The example code below also illustrates the use of factor by
variables.
by
variables may be supplied as numeric matrices as part of specifying general linear functional terms.
If a by
variable is present and numeric (rather than a factor) then the corresponding smooth is only subjected
to an identifiability constraint if (i) the by
variable is a constant vector, or, (ii) for a matrix
by
variable, L
, if L%*%rep(1,ncol(L))
is constant or (iii) if a user defined smooth constructor
supplies an identifiability constraint explicitly, and that constraint has an attibute "always.apply"
.
Linking smooths with ‘id’
It is sometimes desirable to insist that different smooth terms have the same degree of smoothness.
This can be done by using the id
argument to s
or te
terms. Smooths
which share an id
will have the same smoothing parameter. Really this only makes sense if the
smooths use the same basis functions, and the default behaviour is to force this to happen: all smooths
sharing an id
have the same basis functions as the first smooth occurring with that id
. Note
that if you want exactly the same function for each smooth, then this is best achieved by making use of the
summation convention covered under ‘linear functional terms’.
As an example suppose that E(y_i)\equiv\mu_i
and
g(\mu_i) = f_1(x_{1i}) + f_2(x_{2i},x_{3i}) +
f_3(x_{4i})
but that f_1
and f_3
should have the same smoothing parameters (and x_2
and x_3
are on different scales). Then
the gam
formula
y ~ s(x1,id=1) + te(x_2,x3) + s(x4,id=1)
would achieve the desired result. id
can be numbers or character strings. Giving an id
to a
term with a factor by
variable causes the smooths at each level of the factor to have the same smoothing
parameter.
Smooth term id
s are not supported by gamm
.
Linear functional terms
General linear functional terms have a long history in the spline literature including in the penalized GLM context (see e.g. Wahba 1990). Such terms encompass varying coefficient models/ geographic regression, functional GLMs (i.e. GLMs with functional predictors), GLASS models, etc, and allow smoothing with respect to aggregated covariate values, for example.
Such terms are implemented in mgcv
using a simple ‘summation convention’ for smooth terms: If the covariates of a
smooth are supplied as matrices, then summation of the evaluated smooth over the columns of the matrices is implied. Each
covariate matrix and any by
variable matrix must be of the same dimension. Consider, for example the term
s(X,Z,by=L)
where X
, Z
and L
are n \times p
matrices. Let f
denote the thin plate regression
spline specified. The resulting contibution to the i^{\rm th}
element of the linear predictor is
\sum_{j=1}^p L_{ij}f(X_{ij},Z_{ij})
If no L
is supplied then all its elements are taken as 1. In R code terms, let F
denote the n \times p
matrix obtained by evaluating the smooth at the values in X
and Z
. Then the contribution of the term to the
linear predictor is rowSums(L*F)
(note that it's element by element multiplication here!).
The summation convention applies to te
terms as well as s
terms. More details and examples
are provided in
linear.functional.terms
.
Random effects
Random effects can be added to gam
models using s(...,bs="re")
terms (see
smooth.construct.re.smooth.spec
),
or the paraPen
argument to gam
covered below. See gam.vcomp
, random.effects
and
smooth.construct.re.smooth.spec
for further details. An alternative is to use the approach of gamm
.
Penalizing the parametric terms
In case the ability to add smooth classes, smooth identities, by
variables and the summation convention are
still not sufficient to implement exactly the penalized GLM that you require, gam
also allows you to penalize the
parametric terms in the model formula. This is mostly useful in
allowing one or more matrix terms to be included in the formula, along with a
sequence of quadratic penalty matrices for each.
Suppose that you have set up a model matrix \bf X
, and want to penalize the corresponding coefficients, \beta
with two penalties \beta^T {\bf S}_1 \beta
and \beta^T {\bf S}_2 \beta
.
Then something like the
following would be appropriate:
gam(y ~ X - 1,paraPen=list(X=list(S1,S2)))
The paraPen
argument should be a list with elements having names corresponding to the terms being penalized.
Each element of paraPen
is itself a list, with optional elements L
, rank
and sp
: all other elements
must be penalty matrices. If present, rank
is a vector giving the rank of each penalty matrix
(if absent this is determined numerically). L
is a matrix that maps underlying log smoothing parameters to the
log smoothing parameters that actually multiply the individual quadratic penalties: taken as the identity if not supplied.
sp
is a vector of (underlying) smoothing parameter values: positive values are taken as fixed, negative to signal that
the smoothing parameter should be estimated. Taken as all negative if not supplied.
An obvious application of paraPen
is to incorporate random effects, and an example of this is provided below. In this
case the supplied penalty matrices will be (generalized) inverse covariance matrices for the random effects — i.e. precision
matrices. The final estimate of the covariance matrix corresponding to one of these penalties is given by the (generalized)
inverse of the penalty matrix multiplied by the estimated scale parameter and divided by the estimated
smoothing parameter for the penalty. For example, if you use an identity matrix to penalize some coefficients that are to be viewed as i.i.d.
Gaussian random effects, then their estimated variance will be the estimated scale parameter divided by the estimate of the
smoothing parameter, for this penalty. See the ‘rail’ example below.
P-values for penalized parametric terms should be treated with caution. If you must have them, then use the option freq=TRUE
in
anova.gam
and summary.gam
, which will tend to give reasonable results for random effects implemented this way,
but not for terms with a rank defficient penalty (or penalties with a wide eigen-spectrum).
Author(s)
Simon N. Wood simon.wood@r-project.org
References
Wahba (1990) Spline Models of Observational Data SIAM.
Wood S.N. (2017) Generalized Additive Models: An Introduction with R (2nd edition). Chapman and Hall/CRC Press.
Examples
require(mgcv)
set.seed(10)
## simulate date from y = f(x2)*x1 + error
dat <- gamSim(3,n=400)
b<-gam(y ~ s(x2,by=x1),data=dat)
plot(b,pages=1)
summary(b)
## Factor `by' variable example (with a spurious covariate x0)
## simulate data...
dat <- gamSim(4)
## fit model...
b <- gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(b,pages=1)
summary(b)
## note that the preceding fit is the same as....
b1<-gam(y ~ s(x2,by=as.numeric(fac==1))+s(x2,by=as.numeric(fac==2))+
s(x2,by=as.numeric(fac==3))+s(x0)-1,data=dat)
## ... the `-1' is because the intercept is confounded with the
## *uncentred* smooths here.
plot(b1,pages=1)
summary(b1)
## repeat forcing all s(x2) terms to have the same smoothing param
## (not a very good idea for these data!)
b2 <- gam(y ~ fac+s(x2,by=fac,id=1)+s(x0),data=dat)
plot(b2,pages=1)
summary(b2)
## now repeat with a single reference level smooth, and
## two `difference' smooths...
dat$fac <- ordered(dat$fac)
b3 <- gam(y ~ fac+s(x2)+s(x2,by=fac)+s(x0),data=dat,method="REML")
plot(b3,pages=1)
summary(b3)
rm(dat)
## An example of a simple random effects term implemented via
## penalization of the parametric part of the model...
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
## now fit appropriate random effect model...
PP <- list(X=list(rank=20,diag(20)))
rm <- gam(y~ X+s(x0)+s(x1)+s(x2)+s(x3),data=dat,paraPen=PP)
plot(rm,pages=1)
## Get estimated random effects standard deviation...
sig.b <- sqrt(rm$sig2/rm$sp[1]);sig.b
## a much simpler approach uses "re" terms...
rm1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
gam.vcomp(rm1)
## Simple comparison with lme, using Rail data.
## See ?random.effects for a simpler method
require(nlme)
b0 <- lme(travel~1,data=Rail,~1|Rail,method="ML")
Z <- model.matrix(~Rail-1,data=Rail,
contrasts.arg=list(Rail="contr.treatment"))
b <- gam(travel~Z,data=Rail,paraPen=list(Z=list(diag(6))),method="ML")
b0
(b$reml.scale/b$sp)^.5 ## `gam' ML estimate of Rail sd
b$reml.scale^.5 ## `gam' ML estimate of residual sd
b0 <- lme(travel~1,data=Rail,~1|Rail,method="REML")
Z <- model.matrix(~Rail-1,data=Rail,
contrasts.arg=list(Rail="contr.treatment"))
b <- gam(travel~Z,data=Rail,paraPen=list(Z=list(diag(6))),method="REML")
b0
(b$reml.scale/b$sp)^.5 ## `gam' REML estimate of Rail sd
b$reml.scale^.5 ## `gam' REML estimate of residual sd
################################################################
## Approximate large dataset logistic regression for rare events
## based on subsampling the zeroes, and adding an offset to
## approximately allow for this.
## Doing the same thing, but upweighting the sampled zeroes
## leads to problems with smoothness selection, and CIs.
################################################################
n <- 50000 ## simulate n data
dat <- gamSim(1,n=n,dist="binary",scale=.33)
p <- binomial()$linkinv(dat$f-6) ## make 1's rare
dat$y <- rbinom(p,1,p) ## re-simulate rare response
## Now sample all the 1's but only proportion S of the 0's
S <- 0.02 ## sampling fraction of zeroes
dat <- dat[dat$y==1 | runif(n) < S,] ## sampling
## Create offset based on total sampling fraction
dat$s <- rep(log(nrow(dat)/n),nrow(dat))
lr.fit <- gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+s(x3,bs="cr")+
offset(s),family=binomial,data=dat,method="REML")
## plot model components with truth overlaid in red
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
plot(lr.fit,select=k,scale=0)
ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
ind <- sort.int(xx,index.return=TRUE)$ix
lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
rm(dat)
## A Gamma example, by modify `gamSim' output...
dat <- gamSim(1,n=400,dist="normal",scale=1)
dat$f <- dat$f/4 ## true linear predictor
Ey <- exp(dat$f);scale <- .5 ## mean and GLM scale parameter
## Note that `shape' and `scale' in `rgamma' are almost
## opposite terminology to that used with GLM/GAM...
dat$y <- rgamma(Ey*0,shape=1/scale,scale=Ey*scale)
bg <- gam(y~ s(x0)+ s(x1)+s(x2)+s(x3),family=Gamma(link=log),
data=dat,method="REML")
plot(bg,pages=1,scheme=1)