[R] tests for measures of influence in regression

Mon Feb 22 14:00:06 CET 2010

I don't think this information can be found in the documentation, but you can always just check the actual influence.measures() and print.infl() code to find out. Most importantly, influence.measures() incldues the following code:

function (model)
{
    is.influential <- function(infmat, n) {
        k <- ncol(infmat) - 4
        if (n <= k)
            stop("too few cases, n < k")
        absmat <- abs(infmat)
        result <- cbind(absmat[, 1L:k] > 1, absmat[, k + 1] >
            3 * sqrt(k/(n - k)), abs(1 - infmat[, k + 2]) > (3 *
            k)/(n - k), pf(infmat[, k + 3], k, n - k) > 0.5,
            infmat[, k + 4] > (3 * k)/n)
        dimnames(result) <- dimnames(infmat)
        result
    }
    ...
    infmat <- cbind(dfbetas, dffit = dffits, cov.r = cov.ratio,
        cook.d = cooks.d, hat = h)
    ...
    is.inf <- is.influential(infmat, sum(h > 0))
    ...
}

So, a case is flagged if:

- any of its absolute dfbetas values are larger than 1, or
- its absolute dffits value is larger than 3*sqrt(k/(n-k)), or
- abs(1 - covratio) is larger than 3*k/(n-k), or
- its Cook's distance is larger than the 50% percentile of
  an F-distributio with k and n-k degrees of freedom, or
- its hatvalue is larger than 3*k/n,

where k denotes the number of model coefficients (e.g., k = 2 for simple regression with the intercept included in the model).

Best,

--
Wolfgang Viechtbauer                        http://www.wvbauer.com/
Department of Methodology and Statistics    Tel: +31 (43) 388-2277
School for Public Health and Primary Care   Office Location:
Maastricht University, P.O. Box 616         Room B2.01 (second floor)
6200 MD Maastricht, The Netherlands         Debyeplein 1 (Randwyck)

----Original Message----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Frank Tamborello
Sent: Monday, February 22, 2010 00:39 To: r-help at r-project.org
Subject: [R] tests for measures of influence in regression

> influence.measures gives several measures of influence for each
> observation (Cook's Distance, etc) and actually flags observations
> that it determines are influential by any of the measures. Looks
> good! But how does it discriminate between the influential and non-
> influential observations by each of the measures? Like does it do a
> Bonferroni-corrected t on the residuals identified by the influence
> measures or some other test?
>
> Cheers,
>
> Frank Tamborello, PhD
> W. M. Keck Postdoctoral Fellow
> School of Health Information Sciences
> University of Texas Health Science Center, Houston
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.