influence.measures {stats}  R Documentation 
This suite of functions can be used to compute some of the regression (leaveoneout deletion) diagnostics for linear and generalized linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.
influence.measures(model) rstandard(model, ...) ## S3 method for class 'lm' rstandard(model, infl = lm.influence(model, do.coef = FALSE), sd = sqrt(deviance(model)/df.residual(model)), type = c("sd.1", "predictive"), ...) ## S3 method for class 'glm' rstandard(model, infl = influence(model, do.coef = FALSE), type = c("deviance", "pearson"), ...) rstudent(model, ...) ## S3 method for class 'lm' rstudent(model, infl = lm.influence(model, do.coef = FALSE), res = infl$wt.res, ...) ## S3 method for class 'glm' rstudent(model, infl = influence(model, do.coef = FALSE), ...) dffits(model, infl = , res = ) dfbeta(model, ...) ## S3 method for class 'lm' dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...) dfbetas(model, ...) ## S3 method for class 'lm' dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...) covratio(model, infl = lm.influence(model, do.coef = FALSE), res = weighted.residuals(model)) cooks.distance(model, ...) ## S3 method for class 'lm' cooks.distance(model, infl = lm.influence(model, do.coef = FALSE), res = weighted.residuals(model), sd = sqrt(deviance(model)/df.residual(model)), hat = infl$hat, ...) ## S3 method for class 'glm' cooks.distance(model, infl = influence(model, do.coef = FALSE), res = infl$pear.res, dispersion = summary(model)$dispersion, hat = infl$hat, ...) hatvalues(model, ...) ## S3 method for class 'lm' hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...) hat(x, intercept = TRUE)
model 

infl 
influence structure as returned by

res 
(possibly weighted) residuals, with proper default. 
sd 
standard deviation to use, see default. 
dispersion 
dispersion (for 
hat 
hat values H[i,i], see default. 
type 
type of residuals for 
x 
the X or design matrix. 
intercept 
should an intercept column be prepended to 
... 
further arguments passed to or from other methods. 
The primary highlevel function is influence.measures
which produces a
class "infl"
object tabular display showing the DFBETAS for
each model variable, DFFITS, covariance ratios, Cook's distances and
the diagonal elements of the hat matrix. Cases which are influential
with respect to any of these measures are marked with an asterisk.
The functions dfbetas
, dffits
,
covratio
and cooks.distance
provide direct access to the
corresponding diagnostic quantities. Functions rstandard
and
rstudent
give the standardized and Studentized residuals
respectively. (These renormalize the residuals to have unit variance,
using an overall and leaveoneout measure of the error variance
respectively.)
Values for generalized linear models are approximations, as described in Williams (1987) (except that Cook's distances are scaled as F rather than as chisquare values). The approximations can be poor when some cases have large influence.
The optional infl
, res
and sd
arguments are there
to encourage the use of these direct access functions, in situations
where, e.g., the underlying basic influence measures (from
lm.influence
or the generic influence
) are
already available.
Note that cases with weights == 0
are dropped from all
these functions, but that if a linear model has been fitted with
na.action = na.exclude
, suitable values are filled in for the
cases excluded during fitting.
For linear models, rstandard(*, type = "predictive")
provides
leaveoneout cross validation residuals, and the “PRESS”
statistic (PREdictive Sum of Squares, the same as
the CV score) of model model
is
PRESS < sum(rstandard(model, type="pred")^2)
The function hat()
exists mainly for S (version 2)
compatibility; we recommend using hatvalues()
instead.
For hatvalues
, dfbeta
, and dfbetas
, the method
for linear models also works for generalized linear models.
Several R core team members and John Fox, originally in his ‘car’ package.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.
Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181–191. doi: 10.2307/2347550.
Fox, J. (1997). Applied Regression, Linear Models, and Related Methods. Sage.
Fox, J. (2002) An R and SPlus Companion to Applied Regression. Sage Publ.
Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression, second edition. Sage Publ; http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/.
influence
(containing lm.influence
).
‘plotmath’ for the use of hat
in plot annotation.
require(graphics) ## Analysis of the lifecycle savings data ## given in Belsley, Kuh and Welsch. lm.SR < lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) inflm.SR < influence.measures(lm.SR) which(apply(inflm.SR$is.inf, 1, any)) # which observations 'are' influential summary(inflm.SR) # only these inflm.SR # all plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some plot(lm.SR, which = 5) # an enhanced version of that via plot(<lm>) ## The 'infl' argument is not needed, but avoids recomputation: rs < rstandard(lm.SR) iflSR < influence(lm.SR) identical(rs, rstandard(lm.SR, infl = iflSR)) ## to "see" the larger values: 1000 * round(dfbetas(lm.SR, infl = iflSR), 3) cat("PRESS :"); (PRESS < sum( rstandard(lm.SR, type = "predictive")^2 )) stopifnot(all.equal(PRESS, sum( (residuals(lm.SR) / (1  iflSR$hat))^2))) ## Show that "PREresiduals" == L.O.O. Crossvalidation (CV) errors: X < model.matrix(lm.SR) y < model.response(model.frame(lm.SR)) ## Leaveoneout CV leastsquares prediction errors (relatively fast) rCV < vapply(seq_len(nrow(X)), function(i) y[i]  X[i,] %*% .lm.fit(X[i,], y[i])$coef, numeric(1)) ## are the same as the *faster* rstandard(*, "pred") : stopifnot(all.equal(rCV, unname(rstandard(lm.SR, type = "predictive")))) ## Huber's data [Atkinson 1985] xh < c(4:0, 10) yh < c(2.48, .73, .04, 1.44, 1.32, 0) lmH < lm(yh ~ xh) summary(lmH) im < influence.measures(lmH) im plot(xh,yh, main = "Huber's data: L.S. line and influential obs.") abline(lmH); points(xh[im$is.inf], yh[im$is.inf], pch = 20, col = 2) ## Irwin's data [Williams 1987] xi < 1:5 yi < c(0,2,14,19,30) # number of mice responding to dose xi mi < rep(40, 5) # number of mice exposed glmI < glm(cbind(yi, mi yi) ~ xi, family = binomial) summary(glmI) signif(cooks.distance(glmI), 3) # ~= Ci in Table 3, p.184 imI < influence.measures(glmI) imI stopifnot(all.equal(imI$infmat[,"cook.d"], cooks.distance(glmI)))