[R-sig-eco] testing for distribution
Erika Mudrak
mudrak at wisc.edu
Wed May 13 20:17:59 CEST 2009
Jacob- You can use a Chi-squared goodness of fit - chisq.test() for discrete distributions like the negative binomial and a Kolmogorov-Smirnoff test- ks.test() for continuous distributions. They will both produce a p-value which tests the null hypothesis that your data come from the given distribution with stated parameters. Use the parameter estimates from your fitdistr() results. So if p>0.05 (or 0.1 or whatever), your data come from that distribution.
For Discrete distributions, try something like:
fit=fitdistr(.....)
chisq.test(x=ActualData, y=rnbinom(n=length(ActualData), k=fit.k, mu=fit.mu))
#I think this is right, I haven't actually tried it...
# This is akin to quantitatively comparing your histograms...
For continous distributions (such as beta), the code would be this:
fit=fitdistr(...)
ks.test(ActualData, "pbeta", shape1=fit$estimate[1],shape2=fit$estimate[2])
# I've done this successfully
You can use AIC to test if another distribution fits your data better than negative binomial does. I think it's possible for your data to "pass" the Chi-Squared/Kolmogorov-Smirnoff test for two different distributions, but it will fit one better than another.
Erika Mudrak
-------------------------------------------
Erika Mudrak
Graduate Student
Department of Botany
University of Wisconsin-Madison
430 Lincoln Dr
Madison WI, 53706
608-265-2191
mudrak at wisc.edu
----- Original Message -----
From: "Capelle, Jacob" <Jacob.Capelle at wur.nl>
Date: Tuesday, May 12, 2009 11:00 am
Subject: [R-sig-eco] testing for distribution
To: r-sig-ecology at r-project.org
> Dear all,
>
> I have a kind of a theoretical question from which I hope it might
> interest you and hopefully can help me a bit.
>
> In order to obtain ecological (surrvey) data, I try to make a
> prediction about the accuracy of a sampling tool to estimate mussel
> density. For this reason I took a lot of samples at a certain fixed
> location and counted the amount of mussels in each sample. Because
> mussels are aggregated on the sediment, I had a lot of zero values. To
> estimate the sample size I used a binomial distribution and obtained
> the k value and the mu from the fitdistr(x,"negative binomial") (MASS).
>
> The question I have is: how can I test if this distribution accurately
> described my (zero inflated count) data?
>
> I am a bit familiar with the AIC but since I only have counts on one
> variable I cannot perform a GLS.
> Creating a vector with rnbinom() using the k and mu from the
> fitdistr() I plotted a histogram and compared it with my data, this
> showed that is was roughly comparable, but I want to quantify this.
>
> I have a biological background not a statistical one, so I realize I
> can ask silly questions.
> But I hope someone can give me some hints.
>
> Kind regards,
>
> Jacob Capelle
>
> PhD student
> Wageningen Imares
> The Netherlands
> jacob.capelle at wur.nl <
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
More information about the R-sig-ecology
mailing list