[R] Chi-squared test

Fri Nov 25 02:50:59 CET 2005

Marc Schwartz wrote:
> On Thu, 2005-11-24 at 21:55 +0000, Ted Harding wrote:
> 
>>On 24-Nov-05 P Ehlers wrote:
>>
>>>Bianca Vieru- Dimulescu wrote:
>>>
>>>>Hello,
>>>>I'm trying to calculate a chi-squared test to see if my data are 
>>>>different from the theoretical distribution or not:
>>>>
>>>>chisq.test(rbind(c(79,52,69,71,82,87,95,74,55,78,49,60),
>>
>>                    c(80,80,80,80,80,80,80,80,80,80,80,80)))
>>
>>>>      Pearson's Chi-squared test
>>>>
>>>>data:  rbind(c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60),
>>>>             c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80))
>>>>X-squared = 17.6, df = 11, p-value = 0.09142
>>>>
>>>>Is this correct? If I'm doing the same thing using Excel I obtained
>>>>a different value of p.. (1.65778E-14)
>>>>
>>>>Thanks a lot,
>>>>Bianca
>>>
>>>It would be unusual to have 12 observed frequencies all equal to 80.
>>>So I'm guessing that you have a 12-category variable and want to
>>>test its fit to a discrete uniform distribution. I assume that your
>>>frequencies are
>>>
>>>x <- c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60)
>>>
>>>Then just use
>>>
>>>chisq.test(x)
>>>
>>>(see the help page).
>>>
>>>(If those 80's are expected cell frequencies, they should sum to
>>>sum(x) = 851.)
>>>
>>>I don't know what Excel does.
>>>
>>>Peter
>>>
>>>Peter Ehlers
>>>University of Calgary
>>
>>I'm rather with Peter on this question! I've tried to infer what
>>you're really trying to do.
>>
>>My a-priori plausible hypothesis was that you have
>>
>>  k<-12
>>
>>independent observations which have equal expected values
>>
>>  m<-rep(80,k)
>>
>>and are observed as
>>
>>  x<-c(79,52,69,71,82,87,95,74,55,78,49,60)
>>
>>On this basis, a chi-squared test Sum((O-E)^2/E) gives
>>
>>  C2<-sum(((x-m)^2)/m)
>>
>>so C2 = 41.1375, and on this hypothesis the chi-squared would
>>have k=12 degrees of freedom. Then:
>>
>>  1-pchisq(C2,k)
>>## [1] 4.647553e-05
>>
>>which is nowhere near the 1.65778E-14 you report from Excel.
>>Also, the result from Peter's chisq.test(x) is p = 0.0006468,
>>even further away.
> 
> 
> It's late on Turkey Day here, but shouldn't that be:
> 
> 
>>1 - pchisq(C2, k - 1)  # 11 df
> 
> [1] 2.282202e-05
> 
> which is what I get using OO.org's Calc 2.0 with the CHITEST function
> using the two vectors as the observed (x) and expected (m) values. I
> also get this result from Gnumeric 1.4.3 using the same CHITEST
> function.
> 
[snip]

Marc, it's a bit sad to see that OO.org copies Excel's behaviour
to a _fault_. As Peter D. points out, we would expect the expected
frequencies and the observed frequencies to sum to the same value.
Excel (and Calc) blithely ignores that. R, OTH, gives an error
message when the probabilities don't sum to 1.

Turkey soup for a few days now?

Peter Ehlers