[R] chisq.test vs manual calculation - why are different results produced?
dwinsemius at comcast.net
Mon Feb 20 15:24:07 CET 2012
On Feb 20, 2012, at 5:57 AM, Louise Mair wrote:
> I am trying to fit gamma, negative exponential and inverse power
> to a dataset, and then test whether the fit of each curve is good.
> To do
> this I have been advised to calculate predicted values for bins of
> data (I
> have grouped a continuous range of distances into 1km bins), and
> then apply
> a chi-squared test. Example:
>> data <- data.frame(distance=c(1,2,3,4,5,6,7),
> predicted=c(28, 18, 10, 5 ,3, 1, 1))
There's an error with that code.
>> chisq.test(data$observed, data$predicted)
> Which gives:
> Pearson's Chi-squared test
> data: data$observed and data$predicted
> X-squared = 35, df = 25, p-value = 0.0882
> Warning message:
> In chisq.test(data$observed, data$predicted) :
> Chi-squared approximation may be incorrect
> I understand this is due to having observed/predicted values of less
> five, however I am interested to know firstly why R uses such a large
> number of degrees of freedom (when by my understanding there should
> only be
> 4 df), and secondly whether using the following manual calculation is
> therefore inappropriate -
Read the help page Details section .... end of second paragraph.
You probably wanted:
>> X2 <- sum(((data$observed - data$predicted)^2)/data$predicted)
>  0.04114223
> If chi-squared is unsuitable, what other test can I use to determine
> whether my observed and predicted data come from the same
> distribution? The
> frequently recommended fisher's test doesn't seem to be any more
> appropriate as it requires values of greater than 5 for contingency
> larger than 2 x 2.
> Thanks for your help.
> [[alternative HTML version deleted]]
Plain text is requested as the mail format.
David Winsemius, MD
West Hartford, CT
More information about the R-help