magic {mgcv}  R Documentation 
Stable Multiple Smoothing Parameter Estimation by GCV or UBRE
Description
Function to efficiently estimate smoothing parameters in generalized ridge regression problems with multiple (quadratic) penalties, by GCV or UBRE. The function uses Newton's method in multidimensions, backed up by steepest descent to iteratively adjust the smoothing parameters for each penalty (one penalty may have a smoothing parameter fixed at 1).
For maximal numerical stability the method is based on orthogonal decomposition methods, and attempts to deal with numerical rank deficiency gracefully using a truncated singular value decomposition approach.
Usage
magic(y,X,sp,S,off,L=NULL,lsp0=NULL,rank=NULL,H=NULL,C=NULL,
w=NULL,gamma=1,scale=1,gcv=TRUE,ridge.parameter=NULL,
control=list(tol=1e6,step.half=25,rank.tol=
.Machine$double.eps^0.5),extra.rss=0,n.score=length(y),nthreads=1)
Arguments
y 
is the response data vector. 
X 
is the model matrix (more columns than rows are allowed). 
sp 
is the array of smoothing parameters. The vector 
S 
is a list of of penalty matrices. 
off 
is an array indicating the first parameter in the parameter vector that is
penalized by the penalty involving 
L 
is a matrix mapping 
lsp0 
If 
rank 
is an array specifying the ranks of the penalties. This is useful, but not essential, for forming square roots of the penalty matrices. 
H 
is the optional offset penalty  i.e. a penalty with a smoothing parameter fixed at 1. This is useful for allowing regularization of the estimation process, fixed smoothing penalties etc. 
C 
is the optional matrix specifying any linear equality constraints on the fitting
problem. If 
w 
the regression weights. If this is a matrix then it is taken as being the
square root of the inverse of the covariance matrix of 
gamma 
is an inflation factor for the model degrees of freedom in the GCV or UBRE score. 
scale 
is the scale parameter for use with UBRE. 
gcv 
should be set to 
ridge.parameter 
It is sometimes useful to apply a ridge penalty to the fitting problem, penalizing the parameters in the constrained space directly. Setting this parameter to a value greater than zero will cause such a penalty to be used, with the magnitude given by the parameter value. 
control 
is a list of iteration control constants with the following elements:

extra.rss 
is a constant to be added to the residual sum of squares
(squared norm) term in the calculation of the GCV, UBRE and scale parameter
estimate. In conjuction with 
n.score 
number to use as the number of data in GCV/UBRE score calculation: usually the actual number of data, but there are methods for dealing with very large datasets that change this. 
nthreads 

Details
The method is a computationally efficient means of applying GCV or UBRE (often approximately AIC) to the problem of smoothing parameter selection in generalized ridge regression problems of the form:
minimise~ \ { \bf W} ({ \bf Xb  y} ) \^2 + {\bf b}^\prime {\bf Hb} + \sum_{i=1}^m
\theta_i {\bf b^\prime S}_i{\bf b}
possibly subject to constraints {\bf Cb}={\bf 0}
.
{\bf X}
is a design matrix, \bf b
a parameter vector,
\bf y
a data vector, \bf W
a weight matrix,
{\bf S}_i
a positive semidefinite matrix of coefficients
defining the ith penalty with associated smoothing parameter \theta_i
,
\bf H
is the positive semidefinite offset penalty matrix and \bf C
a
matrix of coefficients defining any linear equality constraints on the problem.
{\bf X}
need not be of full column rank.
The \theta_i
are chosen to minimize either the GCV score:
V_g = \frac{n\{\bf W}({\bf y}  {\bf Ay})\^2}{[tr({\bf I}  \gamma {\bf A})]^2}
or the UBRE score:
V_u=\{\bf W}({\bf y}{\bf Ay})\^2/n2 \phi tr({\bf I}\gamma {\bf A})/n + \phi
where \gamma
is gamma
the inflation factor for degrees of freedom (usually set to 1) and \phi
is scale
, the scale parameter. \bf A
is the hat matrix (influence matrix) for the fitting problem (i.e
the matrix mapping data to fitted values). Dependence of the scores on the smoothing parameters is through \bf A
.
The method operates by Newton or steepest descent updates of the logs of the
\theta_i
. A key aspect of the method is stable and economical calculation of the
first and second derivatives of the scores w.r.t. the log smoothing parameters.
Because the GCV/UBRE scores are flat w.r.t. very large or very small \theta_i
,
it's important to get good starting parameters, and to be careful not to step into a flat region
of the smoothing parameter space. For this reason the algorithm rescales any Newton step that
would result in a log(\theta_i)
change of more than 5. Newton steps are
only used if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is
used. Similarly steepest descent is used if the Newton step has to be contracted too far
(indicating that the quadratic model underlying Newton is poor). All initial steepest descent
steps are scaled so that their largest component is 1. However a step is calculated,
it is never expanded if it is successful (to avoid flat portions of the objective),
but steps are successively halved if they do not decrease the GCV/UBRE score, until
they do, or the direction is deemed to have failed. (Given the smoothing parameters the optimal
\bf b
parameters are easily found.)
The method is coded in C
with matrix factorizations performed using LINPACK and LAPACK routines.
Value
The function returns a list with the following items:
b 
The best fit parameters given the estimated smoothing parameters. 
scale 
the estimated (GCV) or supplied (UBRE) scale parameter. 
score 
the minimized GCV or UBRE score. 
sp 
an array of the estimated smoothing parameters. 
sp.full 
an array of the smoothing parameters that actually multiply the elements of

rV 
a factored form of the parameter covariance matrix. The (Bayesian) covariance
matrix of the parametes 
gcv.info 
is a list of information about the performance of the method with the following elements:

Note that some further useful quantities can be obtained using magic.post.proc
.
Author(s)
Simon N. Wood simon.wood@rproject.org
References
Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673686
https://www.maths.ed.ac.uk/~swood34/
See Also
Examples
## Use `magic' for a standard additive model fit ...
library(mgcv)
set.seed(1);n < 200;sig < 1
dat < gamSim(1,n=n,scale=sig)
k < 30
## set up additive model
G < gam(y~s(x0,k=k)+s(x1,k=k)+s(x2,k=k)+s(x3,k=k),fit=FALSE,data=dat)
## fit using magic (and gam default tolerance)
mgfit < magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,
control=list(tol=1e7,step.half=15))
## and fit using gam as consistency check
b < gam(G=G)
mgfit$sp;b$sp # compare smoothing parameter estimates
edf < magic.post.proc(G$X,mgfit,G$w)$edf # get e.d.f. per param
range(edfb$edf) # compare
## p>n example... fit model to first 100 data only, so more
## params than data...
mgfit < magic(G$y[1:100],G$X[1:100,],G$sp,G$S,G$off,rank=G$rank)
edf < magic.post.proc(G$X[1:100,],mgfit,G$w[1:100])$edf
## constrain first two smooths to have identical smoothing parameters
L < diag(3);L < rbind(L[1,],L)
mgfit < magic(G$y,G$X,rep(1,3),G$S,G$off,L=L,rank=G$rank,C=G$C)
## Now a correlated data example ...
library(nlme)
## simulate truth
set.seed(1);n<400;sig<2
x < 0:(n1)/(n1)
f < 0.2*x^11*(10*(1x))^6+10*(10*x)^3*(1x)^10
## produce scaled covariance matrix for AR1 errors...
V < corMatrix(Initialize(corAR1(.6),data.frame(x=x)))
Cv < chol(V) # t(Cv)%*%Cv=V
## Simulate AR1 errors ...
e < t(Cv)%*%rnorm(n,0,sig) # so cov(e) = V * sig^2
## Observe truth + AR1 errors
y < f + e
## GAM ignoring correlation
par(mfrow=c(1,2))
b < gam(y~s(x,k=20))
plot(b);lines(x,fmean(f),col=2);title("Ignoring correlation")
## Fit smooth, taking account of *known* correlation...
w < solve(t(Cv)) # V^{1} = w'w
## Use `gam' to set up model for fitting...
G < gam(y~s(x,k=20),fit=FALSE)
## fit using magic, with weight *matrix*
mgfit < magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w)
## Modify previous gam object using new fit, for plotting...
mg.stuff < magic.post.proc(G$X,mgfit,w)
b$edf < mg.stuff$edf;b$Vp < mg.stuff$Vb
b$coefficients < mgfit$b
plot(b);lines(x,fmean(f),col=2);title("Known correlation")