[Rd] Inaccuracy in DFBETA calculation for GLMs
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Oct 9 10:55:16 CEST 2025
>>>>> Ravi Varadhan via R-devel
>>>>> on Sat, 4 Oct 2025 13:34:48 +0000 writes:
> Hi,
> I have been calculating sensitivity diagnostics in GLMs. I am noticing that the dfbeta() and influence() functions in base R are inaccurate for non-Gaussian GLMs. Even though the help says that the DFBETAs can be inaccurate for GLMs, the accuracy can be substantially improved.
> I was thinking of writing this up along with a proper fix to R Journal but then started wondering whether this is a well-known issue and it has been addressed in other packages.
> Has the inaccuracy of DFBETA been addressed already?
> Thank you,
> Ravi
As nobody has replied till now: No, I haven't heard yet about
such properties and even less that and how they can be
substantially improved (I assume you have "searched the net" for
that).
I agree that this would probably be a nice R journal paper when
accompanied with both math and code.
A subjective remark: Being statistically educated from ETH
Zurich and similar places (UW Seattle, Bellcore): I've been
convinced that such "leave-one-out" diagnostics are not
providing "true robustness" (against violiation of error
distribution assumptions etc), but one should rather use M- (and
MM-)estimation approaches providing a guaranteed breakdown point
above 2/n (or so, which I think is what you get with such
L.o.o. diagnostics: just look at the effect of one huge outlier
masking a large one).
For that reason, I would not want to substantially blow up our
base R code underlying DFBETA (which then has to be kept maintained
into "all" future), but then I'm only speaking for myself and
not all of R core (and even less all of R using statisticians).
Martin
More information about the R-devel
mailing list