[R] ?Accuracy of prop.test

(Ted Harding) ted.harding at wlandres.net
Mon Jul 18 00:24:50 CEST 2011


On 17-Jul-11 16:27:25, Jack Sofsky wrote:
> I have just joined this list (and just started using R), so please 
> excuse any etiquette breaches as I do not yet have a feel for how the 
> list operates.
> 
> I am in the process of teaching myself statistics using R as my utility
> as my ultimate goals cannot be satisfied by Excel or any of the
> plug-ins 
> I could afford.
> 
> I am currently looking at chap12 page 552 of Weiss's Introductory 
> Statistics 9th edition.  Example 12.5 demonstrates using "Technology"
> to 
> obtain a One-Proportion z-Interval.
> 
> n=202
> x=1010
> confidence interval = .95.
> 
> Answer given by Minitab
> 0.175331, .224669
> Answer given by TI-83/84
> .17533, .22467
> Answer given by Weiss's Excel Plug-in
> 0.175 < p < 0.225
> 
> Here is what I got with R
> prop.test(202,1010,correct="FALSE")
> 
>      1-sample proportions test without continuity correction
> 
> data:  202 out of 1010, null probability 0.5
> X-squared = 363.6, df = 1, p-value < 2.2e-16
> alternative hypothesis: true p is not equal to 0.5
> 95 percent confidence interval:
>   0.1764885 0.2257849
> sample estimates:
>    p
> 0.2
> 
> I'm also getting slight differences in the answers for exercises
> and find this disconcerting.
> 
> Why are these differences present  (or am I doing something wrong)?
> Jack

You are not doing anything wrong (at any rate where prop.test is
concerned). The point is that (certainly for R's prop.test,
undoubtedly also for the others to which, however, I do not have
access) none of these procedures uses an exact method -- all are
based on some form of approximation.

In the case of R's prop.test (see the help in '?prop.test')
"The confidence interval is computed by inverting the score test."
That is to say that (possibly after applying Yates's correction)
a Normal-distribution approximation is used for the distribution
of the Z score (deviation/(SD of deviation). I do not know what
methods the others use.

No doubt the different answers are the result of using different
approximations.

If you want a really exact method, find
a) The highest value of p such that the probability of
   a result greater than or equal to x=202 when n=1010
   is at most 0.025 (2.5%)
b) The lowest value of p such that the probability of
   a result less than or equal to x=202 when n=1010
   at at least 0.975 (97.5%)
These are then, respectively, the lower and upper 95% confidence
limits for p with equal "non-coverage" probability (2.5%) at
either side. You can search for these "by hand", with results:

a):
  pbinom(201,1010,0.175739300)
  # [1] 0.97500000   ## = 1 - Prob(x >= 202)

b):
  pbinom(202,1010,0.226022815)
  # [1] 0.02500000   ## = Prob(x <= 202)

so (to 7 decimal places) an exact 96% CI for p is

  (0.1757393,0.2260228)

This agrees with none of the methods you tried (though all
are fairly close together):

(0.1757393,0.2260228) ## Above exact
(0.1753310,0.2246690) ## Minitab
(0.1753300,0.2246700) ## TI-83/84
(0.1750000,0.2250000) ## Weiss's Excel Plug-in
(0.1764885,0.2257849) ## R's prop.test

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jul-11                                       Time: 23:24:47
------------------------------ XFMail ------------------------------



More information about the R-help mailing list