[R] Hat values for generalized additive models

Tue Sep 10 09:20:00 CEST 2002

Would anyone be able to provide insight for the following question, please?

Setting: estimation of prediction intervals for age-period-cohort models
using GAMs (rate ~ s(age,period))
Method: bootstrap (Davison and Hinkley, 1997)
Issue: standardisation of the residuals for resampling requires an
adjustment using the diagonals of the hat matrix.

Is there a simple way to get the hat values out of a GAM object? 

Looking through the code, I couldn't see anywhere where the hat matrix is
constructed. I tried the appended patch (against mgcv_0.8-1.tar.gz). This
returns sensible hat values for normal errors, however the hat values using
Poisson data don't look as good:

require(mgcv)
x=seq(0,1,length=1001)
mu=x

y=rnorm(1001,mu)
plot(gam(y~s(x))$hat,hat(glm(y~x)$qr))

y=rpois(1001,exp(mu))
plot(gam(y~s(x),family=poisson)$hat,hat(glm(y~x,family=poisson)$qr))


I realise that I could use a GLM to get the hat values, however that would
be clumsy. Any help would be appreciated.

Regards, Mark Clements.

(Apologies in advance for formatting etc. This is my first posting.)

diff -ur c:/usr/temp/work/mgcv/R/mgcv.r ./R/mgcv.r

--- c:/usr/temp/work/mgcv/R/mgcv.r	2002-05-17 03:04:26.000000000 +1000
+++ ./R/mgcv.r	2002-09-10 14:22:06.000000000 +1000
@@ -437,6 +437,7 @@
   edf<-array(0,m)       # estimated degrees of freedom
   ddiag<-array(0,3*m)   # array for diagonostics
   idiag<-array(0,3)     # array for diagnostics
+  hat<-array(0,n)       # array for diagnostics
   Vp[1,1]<-1.0
   M$gcv.ubre<-1.0;
   direct.mesh<-100      # number of points for overall s.p. initial direct
search
@@ -448,7 +449,8 @@
 
as.integer(n),as.integer(q),as.integer(C.r),as.double(M$sig2),as.double(Vp),
 
as.double(edf),as.double(M$conv.tol),as.integer(M$max.half),as.double(ddiag)
,
 
as.integer(idiag),as.double(sdiag),as.integer(direct.mesh),as.double(M$min.e
df),
-
as.double(M$gcv.ubre),as.double(M$target.edf),PACKAGE="mgcv")
+                 as.double(M$gcv.ubre),as.double(M$target.edf),
+         as.double(hat),PACKAGE="mgcv")
    
   p<-matrix(oo[[6]],q,1);
   sig2<-oo[[14]]
@@ -459,6 +461,7 @@
   idiag<-oo[[20]]
   sdiag<-oo[[21]]
   M$gcv.ubre<-oo[[24]]
+  M$hat <- oo[[26]]
 
conv<-list(edf=sdiag[1:direct.mesh],score=sdiag[direct.mesh+1:direct.mesh],g
=ddiag[1:m],h=ddiag[(m+1):(2*m)],
 
e=ddiag[(2*m+1):(3*m)],iter=idiag[1],init.ok=as.logical(idiag[2]),step.fail=
as.logical(idiag[3]))
   # unpack results back to correct place in output (which includes fixed
d.f. and free d.f. terms)
@@ -1247,7 +1250,7 @@
         null.deviance = nulldev, iter = iter, weights = wt, prior.weights =
weights, 
         #df.residual = resdf, 
         df.null = nulldf, y = y, converged = conv,sig2=G$sig2,edf=G$edf,
-        boundary = boundary,sp =
G$sp,df=G$df,nsdf=G$nsdf,Vp=G$Vp,mgcv.conv=G$conv,gcv.ubre=G$gcv.ubre)
+        boundary = boundary,sp =
G$sp,df=G$df,nsdf=G$nsdf,Vp=G$Vp,mgcv.conv=G$conv,gcv.ubre=G$gcv.ubre,
hat=G$hat)
 }
 
 
diff -ur c:/usr/temp/work/mgcv/src/mgcv.c ./src/mgcv.c
--- c:/usr/temp/work/mgcv/src/mgcv.c	2002-05-17 02:51:22.000000000 +1000
+++ ./src/mgcv.c	2002-09-10 15:47:55.000000000 +1000
@@ -619,7 +619,8 @@
           double *pd, double *sp,int *offd,int *dimd,int *md,
           int *nd,int *qd,int *rd,double *sig2d,double *Vpd,double *edf,
           double *conv_tol,int *ms_max_half,double *ddiag, int
*idiag,double *sdiag,
-          int *direct_mesh,double *min_edf,double *gcvubre,double
*target_edf)
+          int *direct_mesh,double *min_edf,double *gcvubre,double
*target_edf,
+	  double *hat)
 
 
 /* Solves :
@@ -781,6 +782,13 @@
       for (j=0;j<X.r;j++) for (k=off[i];k<off[i]+S[i].r;k++) 
       edf[i]+=X.M[j][k]*L.M[k][j];
     }
+    for (i=0;i<L.c;i++) 
+      { 
+	hat[i]=0.0;
+	for (j=0;j<L.r;j++)
+	  hat[i]+=X.M[i][j]*L.M[j][i]; 
+	  }
+    /*hat[0]=L.r; hat[1]=L.c; hat[2]=X.r; hat[3]=X.c;*/
     /* multiply Vp by estimated scale parameter so that it is proper
covariance matrix estimate */
     for (i=0;i<Vp.r;i++) for (j=0;j<Vp.c;j++) Vp.M[i][j] *= *sig2d;
     freemat(L);
diff -ur c:/usr/temp/work/mgcv/src/mgcv.h ./src/mgcv.h
--- c:/usr/temp/work/mgcv/src/mgcv.h	2002-05-09 03:18:28.000000000 +1000
+++ ./src/mgcv.h	2002-09-10 14:09:08.000000000 +1000
@@ -1,7 +1,7 @@
 void mgcv(double *yd,double *Xd,double *Cd,double *wd,double *Sd,
      double *pd, double *sp,int *offd,int *dimd,int *md,
      int *nd,int *qd,int *rd,double *sig2d, double *Vpd, double *edf,
double *conv_tol, int *ms_max_half,
-	 double *ddiag,int *idiag,double *sdiag, int *direct_mesh,double
*min_edf,double *gcvubre,double *target_edf);
+	 double *ddiag,int *idiag,double *sdiag, int *direct_mesh,double
*min_edf,double *gcvubre,double *target_edf, double *hat);
 void RGAMsetup(double *Xd,double *Cd,double *Sd,double *UZd,double *Xud,int
*xu,double *xpd,
              int *offd,double *xd,int  *md,int  *nd,int *dfd,int *nsdfd,int
*dim,int *s_type, int *p_order,
              int *by_exists, double *byd,double *knots, int *n_knots);
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._