[R] Chi-squared test
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Fri Nov 25 02:14:22 CET 2005
(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
> On 24-Nov-05 P Ehlers wrote:
> > Bianca Vieru- Dimulescu wrote:
> >> Hello,
> >> I'm trying to calculate a chi-squared test to see if my data are
> >> different from the theoretical distribution or not:
> >>
> >> chisq.test(rbind(c(79,52,69,71,82,87,95,74,55,78,49,60),
> c(80,80,80,80,80,80,80,80,80,80,80,80)))
> >>
> >> Pearson's Chi-squared test
> >>
> >> data: rbind(c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60),
> >> c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80))
> >> X-squared = 17.6, df = 11, p-value = 0.09142
> >>
> >> Is this correct? If I'm doing the same thing using Excel I obtained
> >> a different value of p.. (1.65778E-14)
> >>
> >> Thanks a lot,
> >> Bianca
> >
> > It would be unusual to have 12 observed frequencies all equal to 80.
> > So I'm guessing that you have a 12-category variable and want to
> > test its fit to a discrete uniform distribution. I assume that your
> > frequencies are
> >
> > x <- c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60)
> >
> > Then just use
> >
> > chisq.test(x)
> >
> > (see the help page).
> >
> > (If those 80's are expected cell frequencies, they should sum to
> > sum(x) = 851.)
> >
> > I don't know what Excel does.
> >
> > Peter
> >
> > Peter Ehlers
> > University of Calgary
>
> I'm rather with Peter on this question! I've tried to infer what
> you're really trying to do.
>
> My a-priori plausible hypothesis was that you have
>
> k<-12
>
> independent observations which have equal expected values
>
> m<-rep(80,k)
>
> and are observed as
>
> x<-c(79,52,69,71,82,87,95,74,55,78,49,60)
>
> On this basis, a chi-squared test Sum((O-E)^2/E) gives
>
> C2<-sum(((x-m)^2)/m)
>
> so C2 = 41.1375, and on this hypothesis the chi-squared would
> have k=12 degrees of freedom. Then:
>
> 1-pchisq(C2,k)
> ## [1] 4.647553e-05
>
> which is nowhere near the 1.65778E-14 you report from Excel.
> Also, the result from Peter's chisq.test(x) is p = 0.0006468,
> even further away.
>
> So this makes me really wonder what you are doing.
>
> The nearest I can get to your Excel result 1.65778E-14 is
>
> ix<-x<m
> prod(2*ppois(x[ix],m[ix]))*prod(2*(1-ppois(x[!ix],m[!ix])))
> ## 2.831963e-14
>
> which is based on the guess that independent 2-sided Poisson
> tests of agreement between O and E have been carried out on each
> component, and the final P-value is the product of these P-values.
>
> But this doesn't make a lot of sense from a statistical point
> of view, so it's time to stop guessing!
>
> Please tell us what hypothesis you are testing, what sort of
> distribution the x-values are supposed to have, what the
> repeated "80" values represent, and also please tell us
> in detail what you asked Excel to do!
>
> Then, perhaps, a useful reply can be made.
I think what Excel does is outlined here:
http://www.gifted.uconn.edu/siegle/research/ChiSquare/chiexcel.htm
(Notice the helpful wizard which in step 2 claims that you are doing a
test for independence, not for a given distribution.)
This would seem to coincide with Peter E's guess. The example on that
page matches chisq.test(c(10,3,2))
I believe that the expected values are expected (!) to sum to the
total counts. If they do not, I guess that Excel is numb-skulled
enough to compute sum((O-E)^2/E) anyway and look it up its p value
with k-1 DF. Still gets you nowhere near 1.6e-14 though.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list