[Rd] summary.table bug in parameter (and fix) (PR#2526)
rreeve@liposcience.com
rreeve@liposcience.com
Mon Feb 3 18:43:02 2003
I sent this in with an old version, but it's in latest version as well. The fix is simple.
In the summary.table function, the parameter is calculated incorrectly
for a test of independence among all cells when the table is more than
2-way table.
Example:
Consider X:
> X
a b c
1 A1 B2 C1
2 A3 BA3 C2
3 A2 B1 C4
4 A1 B2 C3
5 A3 BA3 C2
6 A1 BA3 C1
7 A2 BA3 C2
8 A1 BA3 C3
9 A1 B2 C1
10 A1 BA3 C2
11 A2 B1 C2
12 A1 B2 C3
13 A3 BA3 C4
14 A2 B1 C3
15 A2 B1 C2
16 A2 BA3 C4
17 A3 BA3 C3
18 A3 BA3 C4
19 A2 BA3 C4
20 A1 B2 C3
21 A1 BA3 C1
22 A2 B1 C2
23 A2 BA3 C3
24 A2 B1 C2
25 A2 B1 C2
26 A3 B1 C1
27 A2 BA3 C1
28 A2 BA3 C2
29 A2 BA3 C4
30 A3 B1 C2
31 A1 B1 C4
32 A2 B2 C1
33 A1 B2 C3
34 A2 BA3 C3
35 A2 BA3 C1
36 A2 BA3 C1
37 A2 B2 C2
38 A2 B2 C2
39 A2 B2 C4
40 A2 B1 C4
41 A2 B2 C4
42 A1 B1 C3
43 A2 BA3 C4
44 A2 BA3 C2
45 A2 B1 C2
46 A2 B2 C1
47 A3 B1 C3
48 A1 B2 C1
49 A2 B1 C2
50 A1 B1 C1
51 A3 B2 C4
52 A1 B2 C1
53 A3 B2 C1
54 A2 BA3 C1
55 A2 B1 C2
56 A1 BA3 C3
57 A3 B1 C2
58 A1 B1 C3
59 A2 B2 C2
60 A2 BA3 C1
61 A2 B1 C3
62 A3 B2 C2
63 A2 BA3 C2
64 A2 B1 C3
65 A1 BA3 C3
66 A2 B2 C2
67 A2 B1 C4
68 A2 BA3 C2
69 A2 BA3 C2
70 A2 B1 C3
71 A3 BA3 C2
72 A3 BA3 C3
73 A2 B2 C1
74 A2 B1 C3
75 A1 BA3 C1
76 A2 BA3 C3
77 A3 B1 C4
78 A3 BA3 C3
79 A2 B1 C3
80 A2 B1 C3
81 A2 BA3 C4
82 A2 B1 C2
83 A3 BA3 C1
84 A2 BA3 C1
85 A2 BA3 C4
86 A3 BA3 C1
87 A2 BA3 C1
88 A2 B2 C1
89 A3 BA3 C2
90 A2 BA3 C1
91 A1 B1 C4
92 A2 B1 C1
93 A2 B1 C1
94 A2 B1 C3
95 A2 BA3 C1
96 A3 B2 C1
97 A3 BA3 C4
98 A2 B1 C2
99 A2 BA3 C3
100 A2 B1 C2
> X.table <- table(X)
> X.table
, , c = C1
b
a B1 B2 B3
A1 3 2 3
A2 6 4 5
A3 3 2 0
, , c = C2
b
a B1 B2 B3
A1 0 0 1
A2 7 7 7
A3 3 1 3
, , c = C3
b
a B1 B2 B3
A1 3 3 3
A2 6 3 3
A3 1 0 3
, , c = C4
b
a B1 B2 B3
A1 0 0 2
A2 2 2 7
A3 4 1 0
Now, we construct a table from X (3x3x4):
> X.table <- table(X)
> summary(X.table)
Number of cases in table: 100
Number of factors: 3
Test for independence of all factors:
Chisq = 30.232, df = 12, p-value = 0.002576
Chi-squared approximation may be incorrect
The df is incorrect. It is calculated as 2*2*3 = 12, but should be 3*3*4 - 1 - (3-1) - (3-1) - (4-1) = 28 (df of interaction terms not in main effects only model). Now consider equivalent model tested in loglin:
> loglin(X.table, list(1,2,3))
2 iterations: deviation 3.552714e-15
$lrt
[1] 38.32701
$pearson
[1] 30.23244
$df
[1] 28
$margin
$margin[[1]]
[1] "a"
$margin[[2]]
[1] "b"
$margin[[3]]
[1] "c"
The statistic is identical, but df different. The problem is in summary.table:
Current version of summary.table (INCORRECT VERSION):
function (object, ...)
{
if (!inherits(object, "table"))
stop("object must inherit from class table")
n.cases <- sum(object)
n.vars <- length(dim(object))
y <- list(n.vars = n.vars, n.cases = n.cases)
if (n.vars > 1) {
m <- vector("list", length = n.vars)
for (k in seq(along = m)) {
m[[k]] <- apply(object, k, sum)/n.cases
}
expected <- apply(do.call("expand.grid", m), 1, prod) *
n.cases
statistic <- sum((c(object) - expected)^2/expected)
parameter <- prod(sapply(m, length) - 1) #### Problem is on this line (works only for 2-way tables)
y <- c(y, list(statistic = statistic, parameter = parameter,
approx.ok = all(expected >= 5), p.value = pchisq(statistic,
parameter, lower.tail = FALSE), call = attr(object,
"call")))
}
class(y) <- "summary.table"
y
}
summary.table should be (CORRECTED VERSION)
function (object, ...)
{
if (!inherits(object, "table"))
stop("object must inherit from class table")
n.cases <- sum(object)
n.vars <- length(dim(object))
y <- list(n.vars = n.vars, n.cases = n.cases)
if (n.vars > 1) {
m <- vector("list", length = n.vars)
for (k in seq(along = m)) {
m[[k]] <- apply(object, k, sum)/n.cases
}
expected <- apply(do.call("expand.grid", m), 1, prod) *
n.cases
statistic <- sum((c(object) - expected)^2/expected)
parameter <- prod(sapply(m, length)) - (sum(sapply(m,
length) - 1) + 1) ### This line changed (works for all multiway tables)
y <- c(y, list(statistic = statistic, parameter = parameter,
approx.ok = all(expected >= 5), p.value = pchisq(statistic,
parameter, lower.tail = FALSE), call = attr(object,
"call")))
}
class(y) <- "summary.table"
y
}
Using the corrected summary.table (using the correct code above):
> summary(X.table)
Number of cases in table: 100
Number of factors: 3
Test for independence of all factors:
Chisq = 30.232, df = 28, p-value = 0.3522
Chi-squared approximation may be incorrect
These are correct.
--Russell Reeve
rreeve@liposcience.com
--please do not edit the information below--
Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status =
major = 1
minor = 6.2
year = 2003
month = 01
day = 10
language = R
Windows 2000 Professional (build 2195) Service Pack 2.0
Search Path:
.GlobalEnv, package:ctest, Autoloads, package:base