[Rd] probable bugs in stats::loglin calculation of pearson chisq

Michael Friendly friendly at yorku.ca
Tue Jul 9 20:29:21 CEST 2013


In running the following example of a loglinear model for the Titanic data,
I was surprised to see NaN reported for the
Pearson chisq

 > loglin(Titanic, margin=list(1:3, 4))
2 iterations: deviation 2.273737e-13
$lrt
[1] 671.9622

$pearson
[1] NaN

$df
[1] 15

$margin
$margin[[1]]
[1] "Class" "Sex"   "Age"

$margin[[2]]
[1] "Survived"


Tracing it back, this occurs because there are zeros in the 
fitted/expected frequencies
for children among the Crew.

 > # get fitted (expected) values
 > fitted <- loglin(Titanic, margin=list(1:3, 4), fit=TRUE)$fit
2 iterations: deviation 2.273737e-13
 > fitted[Class="Crew",,Age="Child",]
         Survived
Sex      No Yes
   Male    0   0
   Female  0   0

I certainly understand the difference between sampling
zeros and structural zeros, and this distinction seems properly 
implemented in loglin()
via the start= argument, but only in the calculation of Pearson chisq, 
not for LRT.  I think this is a code
bug, but if there is a reason for the difference, it should be 
documented in the help for
loglin.

Another probable bug is that the calculation of of the LRT chisq also 
takes zero fitted values
into account, while the calculation of the Pearson chisq does not, and 
leads to the NaN
result for my example.
It occurs in the following portion of the code for loglin:

     fit <- z$fit
     attributes(fit) <- attributes(table)
     observed <- as.vector(table[start > 0])
     expected <- as.vector(fit[start > 0])
     pearson <- sum((observed - expected)^2/expected)
     observed <- as.vector(table[table * fit > 0])
     expected <- as.vector(fit[table * fit > 0])
     lrt <- 2 * sum(observed * log(observed/expected))

I don't understand the reasons for the different calculations of 
observed & expected
for pearson & lrt.

FWIW, below is how I calculate these in my mosaics.sas program

start chisq(obs, fit);
    *-- Find Pearson and likelihood ratio chisquares;
    gf = sum ( (obs - fit)##2 / ( fit + (fit=0) ) );
    lr = 2 # sum ( obs # log ( (obs+(obs=0)) / (fit + (fit=0)) ) );
    return (gf // lr);
    finish;

-Michael




-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA



More information about the R-devel mailing list