[R] Goodness of fit to Poisson / NegBinomial
Mark Myatt
mark at myatt.demon.co.uk
Thu Feb 8 11:13:09 CET 2001
I (Mark Myatt) wrote:
> > I have some data on parasites on apple leaves and want to do a
> > goodness of fit test to a Poisson distribution. This seems to
> > do it:
> >
> > mites <- c(rep(0,70),
> > rep(1,38),
> > rep(2,17),
> > rep(3,10),
> > rep(4,9),
> > rep(5,3),
> > rep(6,2),
> > rep(7,1))
> >
> > tab <- table(mites)
> > NSU <- length(mites)
> > N <- sum(mites)
> > NSU.Mean <- N / NSU
> > exp <- dpois(as.numeric(dimnames(tab)[[1]]), NSU.Mean) * NSU
> > chi2 <- sum(((as.vector(tab) - exp)^2) / exp)
> >
> > tab
> > exp
> > chi2
> >
> > I have some small expected values and so need to collapse some
> > categories. If anyone has code to do that already ... but my
> > question is how would I do the same for a negative binomial
> > distribution?
Jim Lindsey wrote:
>> The negative binomial density function is available in my rmutil
>> library. (www.luc.ac.be/~jlindsey/rcode.html)
Brian Ripley wrote:
>It is in R too, dnbinom.
That was silly of me ... It is even there on the dpois() help page that
I had already visited. I have a further question ... I am not familiar
with the parameters required for dnbinom:
dnbinom(x, size, prob, log = FALSE)
I am used to 'mu' and 'k'. The parameter 'prob' is I think the failure
probability:
tab[[1]] / NSU
Is that right? I am unsure of how to specify the 'size' parameter in
this context.
I have a proto-function working without dnbinom():
neg.bin.gof <- function(data)
{
case.count <- vector(mode = "numeric", length = max(data) + 2)
observed <- vector(mode = "numeric", length = max(data) + 2)
neg.bin.expected <- vector(mode = "numeric", length = max(data) + 2)
neg.bin.partial.chi.square <- vector(mode = "numeric", length =
max(data) + 2)
#
# calculate frequencies
#
for (i in 0:(max(data) + 1))
{
case.count[i + 1] <- i
observed[i + 1] <- sum(data == i)
}
#
# calculate summary measures
#
N.SU <- length(data)
cases <- sum(data)
cases.per.SU.mean <- cases / N.SU
cases.per.SU.var <- var(data)
#
# find negative parameter 'k' by successive approximation
#
delta.direction <- 0
delta.difference <- 1
k <- cases.per.SU.mean ^ 2 / (cases.per.SU.var - cases.per.SU.mean)
repeat
{
difference <- log10(N.SU / observed[1]) - k * log10(1 +
(cases.per.SU.mean / k))
if(abs(difference) < 0.0000001) break
if (delta.direction != sign(difference)) delta.difference <-
delta.difference / 10
delta.direction <- sign(difference)
k <- k + delta.direction * delta.difference
}
#
# calculate expected number of sampling units with ZERO cases
#
neg.bin.expected[1] <- (1 + (cases.per.SU.mean / k)) ^ (0 - k) * N.SU
neg.bin.sum.expected <- neg.bin.expected[1]
#
# calculate expected number of sampling units with 1, 2, etc. cases
#
for (i in 2:(max(data) + 1))
{
neg.bin.expected[i] <- (cases.per.SU.mean / (cases.per.SU.mean + k))
* ((k + i - 1 - 1) / (i - 1)) * neg.bin.expected[i - 1]
neg.bin.sum.expected <- neg.bin.sum.expected + neg.bin.expected[i]
}
neg.bin.expected[max(data) + 2] <- N.SU - neg.bin.sum.expected
#
# calculate chi-square statistic
#
neg.bin.partial.chi.square <- (observed - neg.bin.expected) ^ 2 /
neg.bin.expected
neg.bin.chi.square.statistic <- sum(neg.bin.partial.chi.square)
#
# some output ... punt out as a list when function debugged
#
output.table <- cbind(case.count, observed, neg.bin.expected,
neg.bin.partial.chi.square)
print(output.table)
print(neg.bin.chi.square.statistic)
print(k)
}
But I really should plug a gap in my ignorance.
Mark
--
Mark Myatt
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list