[R-sig-eco] testing for distribution

Wed May 13 20:17:59 CEST 2009

Jacob-  You can use a Chi-squared goodness of fit - chisq.test() for discrete distributions like the negative binomial and a Kolmogorov-Smirnoff test- ks.test() for continuous distributions.      They will both produce a p-value which tests the null hypothesis that your data come from the given distribution with stated parameters.    Use the parameter estimates from your fitdistr() results. So if p>0.05 (or 0.1 or whatever), your data come from that distribution. 

For Discrete distributions, try something like: 
fit=fitdistr(.....)
chisq.test(x=ActualData, y=rnbinom(n=length(ActualData), k=fit.k, mu=fit.mu))
#I think this is right, I haven't actually tried it...
# This is akin to quantitatively comparing your histograms...

For continous distributions (such as beta), the code would be this: 
fit=fitdistr(...)
ks.test(ActualData, "pbeta", shape1=fit$estimate[1],shape2=fit$estimate[2])
# I've done this successfully

You can use AIC to test if another distribution fits your data better than negative binomial does.  I think it's possible for your data to "pass" the Chi-Squared/Kolmogorov-Smirnoff test for two different distributions, but it will fit one better than another. 

Erika Mudrak

-------------------------------------------
Erika Mudrak
Graduate Student
Department of Botany
University of Wisconsin-Madison
430 Lincoln Dr
Madison WI, 53706
608-265-2191
mudrak at wisc.edu

----- Original Message -----
From: "Capelle, Jacob" <Jacob.Capelle at wur.nl>
Date: Tuesday, May 12, 2009 11:00 am
Subject: [R-sig-eco]  testing for distribution
To: r-sig-ecology at r-project.org

> Dear all,
>  
> I have a kind of a theoretical question from which I hope it might 
> interest you and hopefully can help me a bit.
>  
> In order to obtain ecological (surrvey) data, I try to make a 
> prediction about the accuracy of a sampling tool to estimate mussel 
> density. For this reason I took a lot of samples at a certain fixed 
> location and counted the amount of mussels in each sample. Because 
> mussels are aggregated on the sediment, I had a lot of zero values. To 
> estimate the sample size I used a binomial distribution and obtained 
> the k value and the mu from the fitdistr(x,"negative binomial") (MASS).
>  
> The question I have is: how can I test if this distribution accurately 
> described my (zero inflated count) data?
>  
> I am a bit familiar with the AIC but since I only have counts on one 
> variable I cannot perform a GLS. 
> Creating a vector with rnbinom() using the k and mu from the 
> fitdistr() I plotted a histogram and compared it with my data, this 
> showed that is was roughly comparable, but I want to quantify this.
>  
> I have a biological background not a statistical one, so I realize I 
> can ask silly questions.
> But I hope someone can give me some hints. 
>  
> Kind regards,
>  
> Jacob Capelle
>  
> PhD student
> Wageningen Imares
> The Netherlands
> jacob.capelle at wur.nl < 
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology