[R] hatvalues?

Thu Mar 5 17:39:30 CET 2009

I am struiggling a bit with this function 'hatvalues'.  I would like a little more undrestanding than taking the black-box and using the values. I looked at the Fortran source and it is quite opaque to me. So I am asking for some help in understanding the theory. First, I take the simplest case of a single variant. For this I turn o John Fox's book, "Applied Regression Analysis and Generalized Linear Models, p 245 and generate this 'R' code:

> library(car)
> attach(Davis)
# remove the NA's
> narepwt <- repwt[!is.na(repwt)]
> meanrw <- mean(narepwt)
> drw <- narepwt - meanrw
> ssrw <- sum(drw * drw)
> h <- 1/length(narepwt) + (drw * drw)/ssrw
> h

This gives me a array of values the largest of which is

> order(h, decreasing=TRUE)
  [1]  21  52  17  93  30  62 158 113 175 131 182  29 106 125 123 146  91  99

So the largest "hatvalue" is 

> h[21]
[1] 0.1041207

Which doesn't match the 0.714 value that is reported in the book but I will probably take that up with the author later.

Then I use more of 'R' and I get

fit <- lm(weight ~ repwt)
hr <- hatvalues(fit)
hr[21]
       21 
0.1041207 

So this matches which is reasusing. My question is this, given the QR transformation and the residuals derived from that transformation what is a simple matrix formula for the hatvalues?

>From http://en.wikipedia.org/wiki/Linear_regression I get

residuals = y - Hy = y(I - H)
or
H = -(residuals/y - I)

> fit <- lm(weight ~ repwt)
> h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - diag(1,length(residuals(fit)), length(residuals(fit))))

This generates a matrix but I cannot see any coerrelation between this "hat-matrix" and the return from "hatvalues".

Comments?

Thank you.

Kevin