[R] NaN when using dffits, stemming from lm.influence call
Peter Dunn
dunn at usq.edu.au
Thu Aug 31 04:18:52 CEST 2006
Hi all
I'm getting a NaN returned on using dffits, as explained
below. To me, there seems no obvious (or non-obvious reason
for that matter) reason why a NaN appears.
Before I start digging further, can anyone see why dffits
might be failing? Is there a problem with the data?
Consider:
# Load data
dep <-
read.table("http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/factor1-R.dat",
header=TRUE)
attach(dep)
dep
# Fit Poisson glm
dep.glm2 <- glm( Counts ~ factor(Depression) + factor(SLE) +
factor(Children) + factor(Depression):factor(SLE),
family=poisson( link=log) )
# Compute dffits
dffits( dep.glm2 )
This produces the output:
1 2 3 4 5 6
1.4207746 -0.1513808 NaN 0.9079142 -0.1032664 -1.0860289
7 8
0.4853797 3.8560863
NaN exists for Observation 3. I cannot understand why: there's
nothing grossly unusual or bad about it. I look a bit closer,
and it falls over in lm.influence when computing the deletion
statistic sigma:
> lm.influence(dep.glm2)$sigma
1 2 3 4 5 6 7 8
0.914829 2.134279 NaN 2.186707 2.224885 1.934539 2.225115 1.957111
The rest of the results from lm.influence are OK; for example:
> lm.influence(dep.glm2)$wt.res
1 2 3 4 5 6
2.62840627 -0.88476903 -1.09492912 0.20247856 -0.23114458 -0.95123387
7 8
0.07521515 0.30208051
Use of debug( lm.influence ) indicates the NaN appears in this line
of lm.influence:
res <- .Fortran("lminfl", model$qr$qr, n, n, k, as.integer(do.coef),
model$qr$qraux, wt.res = e, hat = double(n), coefficients = if
(do.coef) matrix(0,
n, k) else double(0), sigma = double(n), tol = 10 *
.Machine$double.eps,
DUP = FALSE, PACKAGE = "base")[c("hat", "coefficients", "sigma",
"wt.res")]
I don't particularly wish to dig around in the Fortran if someone
else can look at it and see my problem easily. But if I must...
The appearance of the NaN seems odd, since (as I understand it)
lm.influence(dep.glm2)$sigma computes sigma when each observation
is removed in turn. So if I remove Observation 3 and try fitting the
model, there are no problems or complaints:
dep.glm3 <- glm( Counts ~ factor(Depression) + factor(SLE) +
factor(Children) + factor(Depression):factor(SLE),
family=poisson( link=log), subset=(-3) )
This produces:
> dep.glm3
Call: glm(formula = Counts ~ factor(Depression) + factor(SLE) +
factor(Children) + factor(Depression):factor(SLE), family =
poisson(link = log), subset = (-3))
Coefficients:
(Intercept) factor(Depression)1
5.4389 -4.1392
factor(SLE)1 factor(Children)1
-0.6503 -2.4036
factor(Depression)1:factor(SLE)1
3.9513
Degrees of Freedom: 6 Total (i.e. Null); 2 Residual
Null Deviance: 695.9
Residual Deviance: 0.8535 AIC: 41.25
No problems, errors, or any signs of potential problems.
In the changes to R 2.3.0 (in the NEWS file,
eg http://mirror.aarnet.edu.au/pub/CRAN/src/base/NEWS)
I find this:
o Influence measures such as rstandard() and cooks.distance()
could return infinite values rather than NaN for a case which
was fitted exactly. Similarly, plot.lm() could fail on such
examples. plot.lm(which = 5) had to be modified to only plot
cases with hat < 1. (PR#8367)
lm.influence() was incorrectly reporting 'coefficients' and
'sigma' as NaN for cases with hat = 1, and on some platforms
not detecting hat = 1 correctly.
The last sentence identifies NaN being reported for sigma, as I
find with my data. But my data do not have hat = 1, but the hat
diagonals are large. The troublesome Observation 3 does not have the
largest hat value in the data though:
> hatvalues(dep.glm2)
1 2 3 4 5 6 7
8
0.1689061 0.1064651 0.9098542 0.9030814 0.3799079 0.6382790 0.9327408
0.9607654
And besides, I am using the most recent version of R (see below). BTW,
the NaNs appear in the previous version of R also.
So I'm flummoxed.
As always, help appreciated.
P.
> version
_
platform i386-pc-linux-gnu
arch i386
os linux-gnu
system i386, linux-gnu
status
major 2
minor 3.1
year 2006
month 06
day 01
svn rev 38247
language R
version.string Version 2.3.1 (2006-06-01)
--
Dr Peter Dunn | Email: dunn <at> usq.edu.au
Faculty of Sciences, University of Southern Queensland
and the Australian Centre for Sustainable Catchments
CRICOS: QLD 00244B | NSW 02225M | VIC 02387D | WA 02521C
More information about the R-help
mailing list