[Rd] probable bugs in stats::loglin calculation of pearson chisq
Michael Friendly
friendly at yorku.ca
Tue Jul 9 20:29:21 CEST 2013
In running the following example of a loglinear model for the Titanic data,
I was surprised to see NaN reported for the
Pearson chisq
> loglin(Titanic, margin=list(1:3, 4))
2 iterations: deviation 2.273737e-13
$lrt
[1] 671.9622
$pearson
[1] NaN
$df
[1] 15
$margin
$margin[[1]]
[1] "Class" "Sex" "Age"
$margin[[2]]
[1] "Survived"
Tracing it back, this occurs because there are zeros in the
fitted/expected frequencies
for children among the Crew.
> # get fitted (expected) values
> fitted <- loglin(Titanic, margin=list(1:3, 4), fit=TRUE)$fit
2 iterations: deviation 2.273737e-13
> fitted[Class="Crew",,Age="Child",]
Survived
Sex No Yes
Male 0 0
Female 0 0
I certainly understand the difference between sampling
zeros and structural zeros, and this distinction seems properly
implemented in loglin()
via the start= argument, but only in the calculation of Pearson chisq,
not for LRT. I think this is a code
bug, but if there is a reason for the difference, it should be
documented in the help for
loglin.
Another probable bug is that the calculation of of the LRT chisq also
takes zero fitted values
into account, while the calculation of the Pearson chisq does not, and
leads to the NaN
result for my example.
It occurs in the following portion of the code for loglin:
fit <- z$fit
attributes(fit) <- attributes(table)
observed <- as.vector(table[start > 0])
expected <- as.vector(fit[start > 0])
pearson <- sum((observed - expected)^2/expected)
observed <- as.vector(table[table * fit > 0])
expected <- as.vector(fit[table * fit > 0])
lrt <- 2 * sum(observed * log(observed/expected))
I don't understand the reasons for the different calculations of
observed & expected
for pearson & lrt.
FWIW, below is how I calculate these in my mosaics.sas program
start chisq(obs, fit);
*-- Find Pearson and likelihood ratio chisquares;
gf = sum ( (obs - fit)##2 / ( fit + (fit=0) ) );
lr = 2 # sum ( obs # log ( (obs+(obs=0)) / (fit + (fit=0)) ) );
return (gf // lr);
finish;
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-devel
mailing list