[Rd] Inaccuracy in DFBETA calculation for GLMs

Wed Oct 22 00:55:12 CEST 2025

Now that does sound interesting. 

Deviance residuals have caused trouble elsewhere. Notably they do not have mean zero, which messed up the residual vs fitted plots until we switched them to Pearson residuals. Someone in the murky past seems to have been in love with them, though...

Diagnostics for GLMs are tricky, witness the differences between binary and binomial (grouped) data, but we might at least try to get their definition right. 

Presumably, they will always be approximations (one-step) so as not to lose computational efficiency? However, I vaguely recall a paper from the 80's by Pregibon, which did something with weights.

If you have a fairly clean patch and maybe also a write-up about it and some tests, I think we should consider it. 

- Peter  

> On 15 Oct 2025, at 15.33, Ravi Varadhan via R-devel <r-devel using r-project.org> wrote:
> 
> Thank you, Martin.
> 
> I agree with the subjective remark.  But that's a different conversation!
> 
> The fix is quite easy. The difference mainly stems from the fact that R uses "deviance" residuals instead of "working" residuals.
> 
> I will proceed as per your advice.
> 
> Thanks & Best regards,
> Ravi
> 
> ________________________________
> From: R-devel <r-devel-bounces using r-project.org> on behalf of Martin Maechler <maechler using stat.math.ethz.ch>
> Sent: Thursday, October 9, 2025 04:55
> To: Ravi Varadhan <ravi.varadhan using jhu.edu>
> Cc: R Development List <R-devel using r-project.org>
> Subject: Re: [Rd] Inaccuracy in DFBETA calculation for GLMs
> 
> 
>      External Email - Use Caution
> 
> 
> 
>>>>>> Ravi Varadhan via R-devel
>>>>>>    on Sat, 4 Oct 2025 13:34:48 +0000 writes:
> 
>> Hi,
>> I have been calculating sensitivity diagnostics in GLMs.  I am noticing that the dfbeta() and influence() functions in base R are inaccurate for non-Gaussian GLMs.  Even though the help says that the DFBETAs can be inaccurate for GLMs, the accuracy can be substantially improved.
> 
>> I was thinking of writing this up along with a proper fix to R Journal but then started wondering whether this is a well-known issue and it has been addressed in other packages.
> 
>> Has the inaccuracy of DFBETA been addressed already?
> 
>> Thank you,
>> Ravi
> 
> As nobody has replied till now:  No, I haven't heard yet about
> such properties and even less that and how they can be
> substantially improved (I assume you have "searched the net" for
> that).
> I agree that this would probably be a nice R journal paper when
> accompanied with both math and code.
> 
> A subjective remark: Being statistically educated from ETH
> Zurich and similar places (UW Seattle, Bellcore): I've been
> convinced that such "leave-one-out" diagnostics are not
> providing "true robustness" (against violiation of error
> distribution assumptions etc), but one should rather use M- (and
> MM-)estimation approaches providing a guaranteed breakdown point
> above 2/n (or so, which I think is what you get with such
> L.o.o. diagnostics: just look at the effect of one huge outlier
> masking a large one).
> 
> For that reason, I would not want to substantially blow up our
> base R code underlying DFBETA (which then has to be kept maintained
> into "all" future),  but then I'm only speaking for myself and
> not all of R core (and even less all of R using statisticians).
> 
> Martin
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel>
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business SchoolSolbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com