[R] Chi-squared test

Thu Nov 24 22:55:17 CET 2005

On 24-Nov-05 P Ehlers wrote:
> Bianca Vieru- Dimulescu wrote:
>> Hello,
>> I'm trying to calculate a chi-squared test to see if my data are 
>> different from the theoretical distribution or not:
>> 
>> chisq.test(rbind(c(79,52,69,71,82,87,95,74,55,78,49,60),
                    c(80,80,80,80,80,80,80,80,80,80,80,80)))
>> 
>>       Pearson's Chi-squared test
>> 
>> data:  rbind(c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60),
>>              c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80))
>> X-squared = 17.6, df = 11, p-value = 0.09142
>> 
>> Is this correct? If I'm doing the same thing using Excel I obtained
>> a different value of p.. (1.65778E-14)
>> 
>> Thanks a lot,
>> Bianca
> 
> It would be unusual to have 12 observed frequencies all equal to 80.
> So I'm guessing that you have a 12-category variable and want to
> test its fit to a discrete uniform distribution. I assume that your
> frequencies are
> 
> x <- c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60)
> 
> Then just use
> 
> chisq.test(x)
> 
> (see the help page).
> 
> (If those 80's are expected cell frequencies, they should sum to
> sum(x) = 851.)
> 
> I don't know what Excel does.
> 
> Peter
> 
> Peter Ehlers
> University of Calgary

I'm rather with Peter on this question! I've tried to infer what
you're really trying to do.

My a-priori plausible hypothesis was that you have

  k<-12

independent observations which have equal expected values

  m<-rep(80,k)

and are observed as

  x<-c(79,52,69,71,82,87,95,74,55,78,49,60)

On this basis, a chi-squared test Sum((O-E)^2/E) gives

  C2<-sum(((x-m)^2)/m)

so C2 = 41.1375, and on this hypothesis the chi-squared would
have k=12 degrees of freedom. Then:

  1-pchisq(C2,k)
## [1] 4.647553e-05

which is nowhere near the 1.65778E-14 you report from Excel.
Also, the result from Peter's chisq.test(x) is p = 0.0006468,
even further away.

So this makes me really wonder what you are doing.

The nearest I can get to your Excel result 1.65778E-14 is

  ix<-x<m
  prod(2*ppois(x[ix],m[ix]))*prod(2*(1-ppois(x[!ix],m[!ix])))
## 2.831963e-14

which is based on the guess that independent 2-sided Poisson
tests of agreement between O and E have been carried out on each
component, and the final P-value is the product of these P-values.

But this doesn't make a lot of sense from a statistical point
of view, so it's time to stop guessing!

Please tell us what hypothesis you are testing, what sort of
distribution the x-values are supposed to have, what the
repeated "80" values represent, and also please tell us
in detail what you asked Excel to do!

Then, perhaps, a useful reply can be made.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Nov-05                                       Time: 21:55:14
------------------------------ XFMail ------------------------------