[R] lm diagnostics and qr (fwd)
John Fox
jfox at mcmail.cis.mcmaster.ca
Thu Jun 26 17:24:17 CEST 2003
Dear Jean,
On Thu, 26 Jun 2003, Jean Eid wrote:
. . .
> My other question is on the regression diagnostics particularly plotting
> Cook's distance. what is the rule to decide on outliers. If I read the
> plot correctly, the labeled distances (vertical lines) are outliers. But I
> have gotten cook's distance and compared them to qf(0, p, n-p) ( the
> median of the F distribution with paramaters p=# of variables in design,
> number of obs.-p) but does not give same answer.
I presume you mean qf(0.5, p, n-p)?
>
. . .
Except for some sense of scale, it's not sensible to treat Cook's
distances as F-values. The use of an F statistic in this context is really
just a kind of trick to obtain a scale-invariant measure of distance
between the coefficient vector for all of the data and the coefficient
vector deleting an observation. There is a rule-of-thumb cutoff for
noteworthy
Cook's distances -- 4/(n - p) -- but I wouldn't place too much stock in
it. It's better simply to look for values of Cook's D that stand out from
the others. Finaly, Cook's D isn't really an outlier diagnostic, but an
influence diagnostic. A low-leverage regression outlier, for example, can
have a small Cook's D.
I hope that this helps,
John
More information about the R-help
mailing list