[R] table problems
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Wed Jun 12 09:13:46 CEST 2002
On Wed, 12 Jun 2002, Robin Hankin wrote:
>
> dear helplist,
>
> my student has fifty trees, numbered one to fifty, and a vector
> recording which tree a certain possum slept in on 12 nights.
>
> R> c
> [1] 3 14 17 22 26 26 17 40 43 25 46 46
> R>
>
> Thus it slept in tree #3 on Monday, then tree #14 on Tues, and so on.
> I wish to test the null hypothesis that the animal chooses trees
> randomly; try
>
> R> table(c)
> c
> 3 14 17 22 25 26 40 43 46
> 1 1 2 1 1 2 1 1 2
> >
>
> Thus it slept in tree #3 once, tree #14 once, tree #17 twice, etc.
Try tabulate(c), which goes to 46. Or, better,
tab <- rep(0,50)
names(tab) <- 1:50
tab[names(table(c))] <- table(c)
> Now on the null hypothesis the expected number of sleeps per tree is
> 12/50=0.24; so how do I carry out a chisquare test on the data,
> including the trees that it never slept in?
>
> chisq.test() doesn't "know" that there are actually fifty distinct
> trees (most of which were never slept in) and not nine.
>
> > chisq.test(table(c))
>
> Chi-squared test for given probabilities
>
> data: table(c)
> X-squared = 1.5, df = 8, p-value = 0.9927
>
> of course this isn't right because chisquared is > 25.8 due to the
> animal sleeping in tree #17 and tree #46 twice (and of course, df
> should be 49 because I have 50 trees).
> chisq.test(tab)
Chi-squared test for given probabilities
data: tab
X-squared = 63, df = 49, p-value = 0.08625
Warning message:
Chi-squared approximation may be incorrect in: chisq.test(tab)
The warning is serious: the approximation is probably dreadful for data
this sparse. In any case, is the null hypothesis plausible: the animal
independently and uniform chooses a tree each night to sleep in, from
exactly the 50 trees your student labelled?
You could easily get a more accurate significance by simulation:
doone <- function(...)
{
c <- sample(1:50, 12, replace = T)
tab <- rep(0,50)
names(tab) <- 1:50
tab[names(table(c))] <- table(c)
chisq.test(tab)$statistic
}
> table(round(sapply(1:1000, doone),3))
38 46.333 54.667 63 71.333 79.667 88 96.333
232 394 229 97 34 6 7 1
(Note how discrete the distribution is.)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list