[R] OT: A test with dependent samples.

Meyners, Michael, LAUSANNE, AppliedMathematics Michael.Meyners at rdls.nestle.com
Wed Feb 11 11:27:45 CET 2009


Rolf,

as you explicitly asked for a comment on your proposal: It is generally
equivalent to McNemar's test and maybe even more appropriate because of
the asymptotics involved in the chi-squared distribution, which might
not be too good with n=12. In some more detail:

McNemar's test basically considers the difference between the
off-diagonal elements, normalized by their sum. You do the same,
ignoring all (0,0) and (1,1) pairs (the latter do not occur in your
example anyway). binom.test(12,12,alternative="greater") gives you the
one-sided p-value.

If, however, you do not use the exact binomial distribution (with n=12)
but the normal approximation, you find E(X)=6 and VAR(X)=3 with X the
number of sequences (0,1), and you observed x=12. This gives you a
z-score of (12-6)/sqrt(3) and a corresponding one-sided p-value:

> pnorm(6/sqrt(3), lower.tail=F)
[1] 0.0002660028

Further, McNemar's test WITHOUT continuity correction gives 

> mcnemar.test(matrix(c(61,0,12,0),2,2), correct=F)

        McNemar's Chi-squared test

data:  matrix(c(61, 0, 12, 0), 2, 2) 
McNemar's chi-squared = 12, df = 1, p-value = 0.000532

It comes as no surprise that the reported (two-sided!) p-value is
exactly twice the (one-sided!) p-value from the binomial test with
normal approximation:

> pnorm(6/sqrt(3), lower.tail=F)*2
[1] 0.0005320055

I do not want to stress the pros and cons of continuity corrections, but
if neglected, you see that you get the same results (except that McNemar
is generally two-sided), due to the relation of normal and chi-squared
distribution. 

If you use the binomial test, you can forget about asymptotics and
continuity correction, that's why I indeed consider you approach
superior to the chi-squared approximation used in McNemar's test. But
you might fail to find a reference for the exact approach...


You briefly asked in a later mail testing for p=0. Indeed, _any_
incident will disprove this hypothesis, and the p value reported by 

> binom.test(1,73,p=0)

        Exact binomial test

data:  1 and 73 
number of successes = 1, number of trials = 73, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0 
95 percent confidence interval:
 0.0003467592 0.0739763232 
sample estimates:
probability of success 
            0.01369863 

is not wrong (as R just tells us it is BELOW a certain value), but could
be refined by saying "p-value = 0". BTW, I am not sure what
"p-value=TRUE" tells me, as derived from binom.test(0,73,p=0). I
personally don't care about either, as to test H0: p=0, I would not use
any software but rather stress some basic statistical theory.


It remains the questions whether McNemar (or your proposal) is generally
appropriate here. Like others, I have doubts, except maybe if
observations were made on two subsequent days, on the second of which
the drug was administered. How could you otherwise be sure that the
increase in incidences it not due to the progression of cancer, say,
which would be difficult to rule out. Also, from my experience and in
contrast to you point of view, I'd rather assume it likely that a vet is
much more reluctant to apply the new treatment to an already vomitting
cat, your "baseline" with 0 out of 73 at least seems suspicious, and
until proven wrong, I'd take the assumption that this is not by chance
only.
I understand that this is an observational study with no
control/randomization, but concluding a side-effect from a statistically
significant change in incidence rates from this study seems questionable
to me. What I'd propose (and used in comparable situations) is to
confine yourself on the confidence interval and let the investigator
decide / discuss whether the lower bound of this is higher than what she
would expect under control or alternative treatment. She needs to have
some experience, if not historical data, and using a test or just this
confidence interval, you will only get a hint on a possible side-effect,
no formal proof (which is always difficult from observational studies).
Discussing the CI would seem fairer and even stronger to me then. In
other words: Mathematically, you'll get an appropriate test, while
conceptually, I'm far from being convinced.

Hope this makes sense, 
Michael




-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Rolf Turner
Sent: Dienstag, 10. Februar 2009 22:33
To: R-help Forum
Subject: [R] OT: A test with dependent samples.


I am appealing to the general collective wisdom of this list in respect
of a statistics (rather than R) question.  This question comes to me
from a friend who is a veterinary oncologist.  In a study that she is
writing up there were 73 cats who were treated with a drug called
piroxicam.  None of the cats were observed to be subject to vomiting
prior to treatment; 12 of the cats were subject to vomiting after
treatment commenced.  She wants to be able to say that the treatment had
a ``significant''
impact with respect to this unwanted side-effect.

Initially she did a chi-squared test.  (Presumably on the matrix
matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't
pursue
this.) I pointed out to her that because of the dependence --- same 73
cats pre- and post- treatment --- the chi-squared test is inappropriate.

So what *is* appropriate?  There is a dependence structure of some sort,
but it seems to me to be impossible to estimate.

After mulling it over for a long while (I'm slow!) I decided that a
non-parametric approach, along the following lines, makes sense:

We have 73 independent pairs of outcomes (a,b) where a or b is 0 if the
cat didn't barf, and is 1 if it did barf.

We actually observe 61 (0,0) pairs and 12 (0,1) pairs.

If there is no effect from the piroxicam, then (0,1) and (1,0) are
equally likely.  So given that the outcome is in {(0,1),(1,0)} the
probability of each is 1/2.

Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis)
the probability of each entry is 1/2.  Hence the probability of this
sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) test
is 0.00024.  Hence the result is ``significant'' at the usual levels,
and my vet friend is happy.

I would very much appreciate comments on my reasoning.  Have I made any
goof-ups, missed any obvious pit-falls?  Gone down a wrong garden path?

Is there a better approach?

Most importantly (!!!): Is there any literature in which this approach
is spelled out?  (The journal in which she wishes to publish will almost
surely demand a citation.  They *won't* want to see the reasoning
spelled out in the paper.)

I would conjecture that this sort of scenario must arise reasonably
often in medical statistics and the suggested approach (if it is indeed
valid and sensible) would be ``standard''.  It might even have a name!
But I have no idea where to start looking, so I thought I'd ask this
wonderfully learned list.

Thanks for any input.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list