[R] Can I use "mcnemar.test" for 3*3 tables (or is there a bug in the command?)

David Winsemius dwinsemius at comcast.net
Mon Jul 20 20:02:28 CEST 2009

On Jul 20, 2009, at 5:22 AM, Tal Galili wrote:

> Hello David Winsemius and the rest of the R help group,
> David, I tried to answer your question to the best of my abilities,  
> If I was unclear or still am leaving some things out, please help me  
> in focusing my situation even further. here are my answers to the  
> questions you posed:
> 1) Please define "better" -
> "better" is the one that is able to handle the questions at hand  
> (marginal homogenity and symmetry) while giving meaningful results  
> although the data is sometimes sparse (with Zeros in it) and the  
> sample size is somewhat small (around 25 kids)

Frankly, the phrases "marginal homogeneity" and "symmetry" are, for me  
anyway, not particularly evocative of an interpretable sort of  
difference. I try to express my findings in terms I think my audience  
may have some chance of understanding:  odds ratios or risk ratios or  
difference in mean effects ...

> 2) And now ... define "right"
> What I meant with "right" is "what test did each of these procedures  
> just perform" and also "what can I learn from each of the P's if  
> they where to pass the significance bar (of let's say .05)"

> 3) "Perhaps from the perspective of a statistically naive reviewer."
> Thank you for pointing to this being superficial, I would love for  
> any help you could give in deepening my understanding.

It appeared from context (which was snipped) that you thought one was  
"better" because its p-value was lower.  If the criterion by which you  
choose one statistical test over another is whether or not it happens  
to produce a signal p <0.05, then I think you are dredging rather than  
analyzing. I think the question should be instead whether the test is  
the most powerful for the particular hypothesis and data situation.

> 4) "The problem I am trying to solve" is for the following situation:
> The data set:
> I am analyzing a data set with subjects (kids) listening to the same  
> music two times (randomized, and on different times and so on), the  
> condition of the experiment is a bit different the first time the  
> kids listens (X=1) then the second time (X=2).
> And the response (Y) the kid is making for the experiment is  
> recorded as an ordered number of three levels:  -1, 0, 1

So you would certainly want a test that properly handles ordinal  
effects. I am not sure that was clear at all from your earlier  
questions. Tests of hypotheses regarding ordered alternatives are  
often more powerful than ones that evaluate less specific alternatives.

> The (statistical) question: did the difference in the experiment  
> conditions yielded different rankings from the kids? and if so, was  
> there a specific direction?
> e.g: did kids who by now (in part one of the experiment) answered  
> mostly -1 and 0, now (in part two of the experiment) started  
> answering more 0 and 1? Or, did kids who by now mostly answered 0  
> now started answering -1 and 1 ? and so on.
> Analyses approach:
> There are two basic ways to do this.
> 1) The first one is a Willcox test, to see if there was change in  
> answers  (Y) between the two situations (X=1, X=2)

I am here puzzled. Is the Willcox test a well known one in your  
academic domain? If it is I apologize for my lack of breadth in named  
tests.  Or could you be referring to what is invoked in R with  
wilcox.test()?  I am guessing from context that you might be asking  
about the Wilcoxon signed-rank test for paired data situations. It  
would in fact address the ordering of your paired outcomes, but all of  
the Wilcoxon tests are based on the measures being from a continuous  
distribution and statistical validity for your situation would be  

I would think that a proportional odds model for ordinal repeated  
responses would fit the data situation and the hypothesis of interest.
You may want to search out Laura Thompson's R/S companion to Agresti's  
text. She has some worked examples.

> 2) The second one is to produce a 3 by 3 table, with the rows  
> indicating what the kids answered to setting 1 of the experiment,  
> and the columns indicating the kids answers to setting 2.
> Now the question is:
> was there marginal homogenity? if not, then that is an indicator  
> that the general response to the experimental settings was different  
> for the kids.

Can you put into natural language what you will explain to your  
audience once you determine the presence or absence of "marginal  
> Challenges:
> 1) what about symmetry ?
> As Peter pointed out - you can easily check that the following two  
> matrices have the same homogeneous margins, but only one is symmetric:
> 3 2 1
> 2 3 2
> 1 2 3
> 3 1 2
> 3 3 1
> 0 3 3
> And running the two tests we have yields very interesting results  
> (and if someone has an explanation for them, they would be greatly  
> appreciated):
> > tt <- as.table(t(matrix(c(30,10,20,
> +                           30,30 ,10,
> +                           0 ,30 ,30)
> +                           , ncol .... [TRUNCATED]

The truncation is most  unfortunate since it results our not seeing  
what made these two calls different.
> > print(tt)
>    A  B  C
> A 30 10 20
> B 30 30 10
> C  0 30 30
> > mcnemar.test(tt)
>         McNemar's Chi-squared test
> data:  tt
> McNemar's chi-squared = 40, df = 3, p-value = 1.066e-08
> > mh_test(tt)
>         Asymptotic Marginal-Homogeneity Test
> data:  response by groups (Var1, Var2)
>          stratified by block
> chi-squared = 0, df = 2, p-value = 1
> > tt <- as.table(t(matrix(c(30,10,20,
> +                           30,30 ,10,
> +                           1 ,30 ,30)
> +                           , ncol .... [TRUNCATED]
> > print(tt)
>    A  B  C
> A 30 10 20
> B 30 30 10
> C  1 30 30
> > mcnemar.test(tt)
>         McNemar's Chi-squared test
> data:  tt
> McNemar's chi-squared = 37.1905, df = 3, p-value = 4.194e-08

The truncation snipped off the likely sources of the differences.
> > mh_test(tt)
>         Asymptotic Marginal-Homogeneity Test
> data:  response by groups (Var1, Var2)
>          stratified by block
> chi-squared = 0.0244, df = 2, p-value = 0.9879
> 2) what about sparsity ?
> What is the correct way to handle a sparse tables that includes some  
> Zeros in them?
> (is filling them with  1's, in cases where the mcnemar is resulting  
> with NA's a legitimate strategy ?)
> David, thank you for the queries and the good intentions,
> I would be very happy for any help, directions, clerifications from  
> you and from the other members of this wonderful discussion group.
> With much gratitude,
> Tal
> I hopes this helped clarify
> On Mon, Jul 20, 2009 at 3:20 AM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
> On Jul 19, 2009, at 6:09 PM, Tal Galili wrote:
> Hello Charles,
> Thank you for the detail reply.
> I am still left with the leading question which is: which test  
> should I use
> when analyzing the 3 by 3 matrix I have? The mcnemar.test or the   
> mh_test?
> Is the one necessarily better then the other?
> Please define "better".
> (for example for
> sparser matrices ?)
> That does not help.
> What about:
> mh_test(as.table(matrix(1:16,4)))
> It returns a very significant result:
> chi-squared = 11.4098, df = 3, p-value = 0.009704
> Where as "mcnemar.test(matrix(1:16,4))", didn't:
> McNemar's chi-squared = 11.5495, df = 6, p-value = 0.0728
> So which one is "right" ?
> And now ... define "right".
> (from the looks of it, the mh_test is doing much better)
> Perhaps from the perspective of a statistically naive reviewer.
> Should the strategy be to try and use both methods, and start  
> digging when
> one doesn't sit well with the other?
> I am reminded of Jim Holtam's tag line:  "What problem are you  
> trying to solve?"
> Thanks,
> Tal
> On Sun, Jul 19, 2009 at 10:26 PM, Charles C. Berry <cberry at tajo.ucsd.edu 
> >wrote:
> On Sun, 19 Jul 2009, Tal Galili wrote:
> Hello David,Thank you for your answer.
> Do you know then what does the "mcnemar.test" do in the case of a 3*3
> table
> ?
>      print(mcnemar.test)
> will show you what it does.
> Because the results for the simple example I gave are rather  
> different (P
> value of 0.053 VS 0.73)
> The test mcnemar.test() constructs is one of symmetry, which is  
> equivalent
> to marginal homogenity in hierarchical log-linear models as I recall  
> from
> Bishop, Fienberg, and Holland's 1975 opus on count data.
> Stuart-Maxwell uses the dispersion matrix of marginal difference.
> These are two different tests. I suspect that Stuart-Maxwell is less
> susceptible to continuity issues in very sparse tables, which may  
> account
> for the difference you see here.
> In case the mcnemar can't really handle a 3*3 matrix (or more),  
> shouldn't
> there be an error massage for this case? (if so, who should I turn  
> to, in
> order to report this?)
> Well, the code is pretty straightforward and
>      mcnemar.test(matrix(1:16,4))
> returns 11.5495 which is correct.
> It looks like there is nothing to report. 3,1,5), ncol = 3))))
> Chuck
> Thanks again,
> Tal
> On Sun, Jul 19, 2009 at 3:47 PM, David Freedman <3.14david at gmail.com>
> wrote:
> There is a function mh_test in the coin package.
> library(coin)
> mh_test(tt)
> The documentation states, "The null hypothesis of independence of  
> row and
> column totals is tested. The corresponding test for binary factors x  
> and
> y
> is known as McNemar test. For larger tables, Stuart’s W0 statistic
> (Stuart,
> 1955, Agresti, 2002, page 422, also known as Stuart-Maxwell test) is
> computed."
> hth, david freedman
> Tal Galili wrote:
> Hello all,
> I wish to perform a mcnemar test for a 3 by 3 matrix.
> By running the slandered R command I am getting a result but I am not
> sure
> I
> am getting the correct one.
> Here is an example code:
> (tt <-  as.table(t(matrix(c(1,4,1    ,
>                          0,5,5,
>                          3,1,5), ncol = 3))))
> mcnemar.test(tt, correct=T)
> #And I get:
>      McNemar's Chi-squared test
> data:  tt
> McNemar's chi-squared = 7.6667, df = 3, p-value = *0.05343*
> Now I was wondering if the test I just performed is the correct one.
> From looking at the Wikipedia article on mcnemar (
> http://en.wikipedia.org/wiki/McNemar's_test), it is said that:
> "The Stuart-Maxwell
> test<http://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm>
> is
> different generalization of the McNemar test, used for testing  
> marginal
> homogeneity in a square table with more than two rows/columns"
> From searching for a Stuart-Maxwell
> test<http://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm>
> in
> google, I found an algorithm here:
> http://www.m-hikari.com/ams/ams-password-2009/ams-password9-12-2009/abbasiAMS9-12-2009.pdf
> From running this algorithm I am getting a different P value, here is
> the
> (somewhat ugly) code I produced for this:
> get.d <- function(xx)
> {
> length1 <- dim(xx)[1]
> ret1 <- margin.table(xx,1) - margin.table(xx,2)
> return(ret1)
> }
> get.s <- function(xx)
> {
> the.s <- xx
> for( i in 1:dim(xx)[1])
> {
>  for(j in 1:dim(xx)[2])
>  {
>    if(i == j)
>    {
>      the.s[i,j] <- margin.table(xx,1)[i] + margin.table(xx,2)[i] -
> 2*xx[i,i]
>    } else {
>      the.s[i,j] <- -(xx[i,j] + xx[j,i])
>    }
>  }
> }
> return(the.s)
> }
> chi.statistic <- t(get.d(tt)[-3]) %*% solve(get.s(tt)[-3,-3])  %*%
> get.d(tt)[-3]
> paste("the P value:", pchisq(chi.statistic, 2))
> #and the result was:
> "the P value: 0.268384371053358"
> So to summarize my questions:
> 1) can I use "mcnemar.test" for 3*3 (or more) tables ?
> 2) if so, what test is being performed (
> Stuart-Maxwell<
> http://ourworld.compuserve.com/homepages/jsuebersax/mcnemar.htm>)
> ?
> 3) Do you have a recommended link to an explanation of the algorithm
> employed?
> Thanks,
> Tal

> snipped various sigs

