[R] chisq.test using amalgamation automatically (possible ?!?)

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jun 27 09:32:26 CEST 2005


You have actually used chisq.test to test independence of the cross 
tabulation of x and y as factors, a table with 1 on the diagonal and 0 
elsewhere.  I doubt this was your intention, but unfortunately you have 
not told us your actual intention.

Perhaps you intended y to be the expected values, but as they do not have 
the same sum as x it is not clear what distribution is appropriate.
(The standard theory assumes that the total count was used in determining 
the expected values from supplying probabilities, which is why df=9 would 
be used with 10 categories.)

You can use the expected values _if known in advance_ to amalgamate 
categories, but in most uses of chisq.test they are not known in advance.
In any case, without some knowledge of the context, you cannot decide 
which categories should be merged: your choices are arbitrary unless the 
categories are ordered.  Suppose they applied to types of fruit?
If you know that, then certainly you can program R to do the amalgamation 
for you.

BTW, it is just confusing (at least to your readers) to supply the default 
values of arguments explicitly.  pchisq(Chi.sq, df=9) would suffice.


On Sun, 26 Jun 2005, Mohammad Ehsanul Karim wrote:

> Dear List,
>
>
> If any of observed and/or expected data has less than
> 5 frequencies, then  chisq.test (Pearson's Chi-squared
> Test for Count Data from package:stats) gives warning
> messages. For example,
>
> x<-c(10, 14, 10, 11, 11, 7, 8, 4, 1, 4, 4, 2, 1, 1, 2,
> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
> y<-c(9.13112391745095, 13.1626482033341,
> 12.6623267638188, 11.0130706413029, 9.16415925139016,
> 7.47441794889028, 6.03743388141852, 4.85350508692505,
> 3.89248001363859, 3.11803140037476, 2.49617540962629,
> 1.99774139023269, 1.5985926374167, 1.27909653584089,
> 1.02341602646530, 0.818828097315106,
> 0.655132353196336, 0.524159229418155,
> 0.418022824890164, 0.335528136508225,
> 0.268448671671046, 0.214779801990545,
> 0.171840507806838, 0.137485729582785,
> 0.109999238967747, 0.0880079144684513,
> 0.070413150156564)
>
> Chi.Sq<-sum((c(x[1:7], sum(x[8:9]), sum(x[10:11]),
> sum(x[12:27]))-c(y[1:7], sum(y[8:9]), sum(y[10:11]),
> sum(y[12:27])))^2/c(y[1:7], sum(y[8:9]),
> sum(y[10:11]), sum(y[12:27]))) # using amalgamation
> pchisq(Chi.Sq, df=9, ncp=0, lower.tail = FALSE, log.p
> = FALSE) # result being 0.8830207
>
> but chisq.test(x,y) gives the following output with
> incorrect df:
>
>        Pearson's Chi-squared test
>
> data:  x and y
> X-squared = 216, df = 208, p-value = 0.3373
>
> Warning message:
> Chi-squared approximation may be incorrect in:
> chisq.test(x, y)
>
> Is there any way that we can use directly chisq.test
> without having warning message in such cases (that is,
> using amalgamation conveniently so that we don't have
> to check each elements if they are less than 5 or not
> - the whole process being automatic, may be by means
> of programming)?
>
> Any hint, help, support, references will be highly
> appreciated.
> Thank you for your time.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list