[R] chisq.test using amalgamation automatically (possible ?!?)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jun 27 09:32:26 CEST 2005
You have actually used chisq.test to test independence of the cross
tabulation of x and y as factors, a table with 1 on the diagonal and 0
elsewhere. I doubt this was your intention, but unfortunately you have
not told us your actual intention.
Perhaps you intended y to be the expected values, but as they do not have
the same sum as x it is not clear what distribution is appropriate.
(The standard theory assumes that the total count was used in determining
the expected values from supplying probabilities, which is why df=9 would
be used with 10 categories.)
You can use the expected values _if known in advance_ to amalgamate
categories, but in most uses of chisq.test they are not known in advance.
In any case, without some knowledge of the context, you cannot decide
which categories should be merged: your choices are arbitrary unless the
categories are ordered. Suppose they applied to types of fruit?
If you know that, then certainly you can program R to do the amalgamation
for you.
BTW, it is just confusing (at least to your readers) to supply the default
values of arguments explicitly. pchisq(Chi.sq, df=9) would suffice.
On Sun, 26 Jun 2005, Mohammad Ehsanul Karim wrote:
> Dear List,
>
>
> If any of observed and/or expected data has less than
> 5 frequencies, then chisq.test (Pearson's Chi-squared
> Test for Count Data from package:stats) gives warning
> messages. For example,
>
> x<-c(10, 14, 10, 11, 11, 7, 8, 4, 1, 4, 4, 2, 1, 1, 2,
> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
> y<-c(9.13112391745095, 13.1626482033341,
> 12.6623267638188, 11.0130706413029, 9.16415925139016,
> 7.47441794889028, 6.03743388141852, 4.85350508692505,
> 3.89248001363859, 3.11803140037476, 2.49617540962629,
> 1.99774139023269, 1.5985926374167, 1.27909653584089,
> 1.02341602646530, 0.818828097315106,
> 0.655132353196336, 0.524159229418155,
> 0.418022824890164, 0.335528136508225,
> 0.268448671671046, 0.214779801990545,
> 0.171840507806838, 0.137485729582785,
> 0.109999238967747, 0.0880079144684513,
> 0.070413150156564)
>
> Chi.Sq<-sum((c(x[1:7], sum(x[8:9]), sum(x[10:11]),
> sum(x[12:27]))-c(y[1:7], sum(y[8:9]), sum(y[10:11]),
> sum(y[12:27])))^2/c(y[1:7], sum(y[8:9]),
> sum(y[10:11]), sum(y[12:27]))) # using amalgamation
> pchisq(Chi.Sq, df=9, ncp=0, lower.tail = FALSE, log.p
> = FALSE) # result being 0.8830207
>
> but chisq.test(x,y) gives the following output with
> incorrect df:
>
> Pearson's Chi-squared test
>
> data: x and y
> X-squared = 216, df = 208, p-value = 0.3373
>
> Warning message:
> Chi-squared approximation may be incorrect in:
> chisq.test(x, y)
>
> Is there any way that we can use directly chisq.test
> without having warning message in such cases (that is,
> using amalgamation conveniently so that we don't have
> to check each elements if they are less than 5 or not
> - the whole process being automatic, may be by means
> of programming)?
>
> Any hint, help, support, references will be highly
> appreciated.
> Thank you for your time.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list