[R] chisq.test vs manual calculation - why are different results produced?

Mon Feb 20 15:24:07 CET 2012

On Feb 20, 2012, at 5:57 AM, Louise Mair wrote:

> Hello,
>
> I am trying to fit gamma, negative exponential and inverse power  
> functions
> to a dataset, and then test whether the fit of each curve is good.  
> To do
> this I have been advised to calculate predicted values for bins of  
> data (I
> have grouped a continuous range of distances into 1km bins), and  
> then apply
> a chi-squared test. Example:
>
>> data <- data.frame(distance=c(1,2,3,4,5,6,7),  
>> observed=c(43,13,10,6,2,1),
> predicted=c(28, 18, 10, 5 ,3, 1, 1))

There's an error with that code.

>
>> chisq.test(data$observed, data$predicted)
>
> Which gives:
>
>        Pearson's Chi-squared test
>
> data:  data$observed and data$predicted
> X-squared = 35, df = 25, p-value = 0.0882
>
> Warning message:
> In chisq.test(data$observed, data$predicted) :
>  Chi-squared approximation may be incorrect
>
> I understand this is due to having observed/predicted values of less  
> than
> five, however I am interested to know firstly why R uses such a large
> number of degrees of freedom (when by my understanding there should  
> only be
> 4 df), and secondly whether using the following manual calculation is
> therefore inappropriate -

Read the help page Details section .... end of second paragraph.

You probably wanted:

chisq.test(cbind(data$observed, data$predicted))

>
>> X2 <- sum(((data$observed - data$predicted)^2)/data$predicted)
>> 1-pchisq(X2,4)
> [1] 0.04114223
>
> If chi-squared is unsuitable, what other test can I use to determine
> whether my observed and predicted data come from the same  
> distribution? The
> frequently recommended fisher's test doesn't seem to be any more
> appropriate as it requires values of greater than 5 for contingency  
> tables
> larger than 2 x 2.
>
> Thanks for your help.
>
> Louise
>
> 	[[alternative HTML version deleted]]

Plain text is requested as the mail format.

David Winsemius, MD
West Hartford, CT