[R] What PRECISELY is the dfbetas() or lm.influence()$coef ?

Thu Jun 12 22:58:54 CEST 2003

Dear Hormuzd,

At 01:24 PM 6/12/2003 -0400, Katki, Hormuzd (NIH/NCI) wrote:
>         Hello.  I want to get the proper influence function for the glm
>coefficients in R.  This is supposed to be inv(information)*(y-yhat)*x.  So
>I am wondering what is the exact mathematical formula for the output that
>the functions:
>
>dfbeta()  OR   lm.influence()$coefficients
>
>return for a glm model.  I am confused because:
>
>1. Their columns don't sum to zero as influences should.

Even in a linear model, where the computation is exact, this isn't the 
case, if influence is defined as the change in the coefficients upon 
deleting each observation in turn (i.e., as dfbeta).

>2. They return different "influences", so the 2 functions are doing
>something different.

That's odd. I believe that dfbeta() for a GLM simply uses influence.glm, 
which has the same $coefficients component as lm.influence. As such, for a 
GLM, both are based on the last step of the IRLS fit -- i.e., a 
linearization of the model.

>3. I think they divide each element by the standard error of the
>corresponding coefficient, but that's not enough to resolve any
>discrepancies

Perhaps you meant that dfbetas() [not dfbeta()] returns different values 
from lm.influence()$coef (as in your subject line)? dfbetas standardizes 
the coefficient changes by coefficient standard errors, using a deleted 
estimate of the dispersion parameter.

>The documentation doesn't provide any details.  Any help would be greatly
>appreciated.

I hope that this helps,
  John

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox