[R-sig-eco] trouble in fitting a distirbution (matteo dossena)

Fri Oct 7 17:21:35 CEST 2011

> Thanks Andy, thats really helpful..
> 
> I'm moving my first step into MLE, so far i've been using the binning method and fit linear regression using a mixed effect model to derive the average exponent across my replicates, but i'm not very happy with this approach, i need a more accurate way yo estimate the exponent. So i followed White et al 2008 and solved the MLE for mu (or lambda). I'm happy with that.. but my concern, and what's pointed me toward the approach i outlined in the previous post, is once i get the estimate of the exponent of the distribution of my variable for each replicate, would i be doing right if i compare it across treatments, and also i would asses the goodness of my fits?
> 
> but as i said I'm a beginner in all this, so i might miss out something rally obvious..i might find the answer in your works...
> 
> By the way the way the real data I'm trying to fit the distribution to are actually a body size distributions, which i assume is bounded (if bounded means with a lower and upper limits...actually the distribution i have to fit is a truncated pareto), so as you said can't be solved analytically..
> but I'll read your papers to make clearer to myself what I'm trying to do, and if you don't mind, let you know how the process goes on..
> 
> with regard to point 3, i apologise but there are errors in that line, as what i called location (which actually is a) and shape (which actually is lambda), were messed up the right way would have been 
> 
> density <- -(lambda+1)*(a^(lambda+1))*x^lambda
> 
> about the minus sign instead is there in the White et al 2007 tab. 1, which is from where i got the function of the PDF... but yes you are right, how can the density be negative...i might got this wrong too
> 
> thanks al lot
> m. 

Il giorno 7 Oct 2011, alle ore 01:20, Edwards, Andrew ha scritto:

> Dear Matteo,
> 
> I haven't looked into the details of the mle2 code, but three potential issues spring to mind.
> 
> 1. It looks like you're trying to find the maximum likelihood estimate (MLE) for each of the two parameters of the Pareto distribution (for which the probability density function is a power-law above some value, call it a, and zero elsewhere). I'll write the probability density function as
> 
> f(x) = C x^(- mu), x >= a, where C is the normalisation constant; C = (mu - 1) a^(mu - 1).
> 
> The MLEs can be found analytically, you don't need do it numerically. The MLE for mu is just
> 
> 1 - n / (n log a - sum (log x_i) )
> 
> where x_i are the data with sample size n.
> 
> The MLE for a is just the minimum of your x_i.
> 
> 2. Trying to estimate the MLE for a numerically may be what's causing trouble. The likelihood function value won't be have a zero gradient at the MLE (unlike what usually happens), because the gradient is always positive.
> 
> To see this, I actually have the log-likelihood function given as equation (4) in a recent paper (Using likelihood to test for Lévy flight search patterns and for general power-law distributions in nature, Journal of Animal Ecology 77:1212-1222. If you take that equation and differentiate with respect to a (which I presume is the same as your 'location' parameter), you just get
> 
> n (mu - 1) / a  
> 
> where n is the sample size and mu>1 is the power-law exponent term (I think the negative of your shape parameter). This derivative is therefore always positive, so does not have a local maximum, and so the maximum likelihood attainable happens when a is as large as possible. However, since a is, by definition, the minimum value of the distribution, the maximum value it can take is the minimum value of your data (since otherwise you would invalidate an assumption of your distribution).  
> 
> So if your numerical procedure is looking for a zero gradient of the likelihood function, it won't find it with respect to a, as the gradient is always positive.
> 
> 3. I'm wondering if the - sign in your line:
> 
>> 	density <- -(location+1)*(location^(shape+1))*x^shape
> 
> is correct, as I would have thought density should be positive.
> 
> I hope this helps. People across several fields have made fitting Pareto distributions much more complicated than it needs to be, but you're on the right track using likelihood. It's just that you can do it analytically, so there's no need for any optimisation. 
> 
> Also useful might be: 
> White, E.P., Enquist, B.J., & Green, J.L. (2008) On estimating the exponent of power-law frequency distributions. Ecology, 89, 905-912.
> 
> Edwards, A.M. (2011) Overturning conclusions of Lévy flight movement patterns by fishing boats and foraging animals. Ecology, 92(6):1247-1257. 
> 
> which both also discuss bounded distributions (which can't be solved analytically).
> 
> I hope that this helps,
> 
> Andy
> 
> -- 
> Dr. Andrew Edwards 
> Andrew.Edwards at dfo-mpo.gc.ca  **NEW** 
> http://www.chebucto.ns.ca/~english 
> 
> Pacific Biological Station, 
> Fisheries and Oceans Canada, 
> 3190 Hammond Bay Road, 
> Nanaimo, British Columbia, 
> V9T 6N7, Canada. 
> Tel: +1 250-756-7146 
> Fax: +1 250 756-7053 
> 
>> ------------------------------
>> 
>> Message: 3
>> Date: Wed, 5 Oct 2011 18:19:46 +0100
>> From: matteo dossena <matteo.dossena at gmail.com>
>> To: r-sig-ecology at r-project.org
>> Subject: [R-sig-eco] trouble in fitting a distirbution
>> Message-ID: <608BF773-D9AA-4CFD-A5C1-2C510ED0344A at gmail.com>
>> Content-Type: text/plain
>> 
>> Hi All,
>> 
>> I'm trying to fit a theoretical distribution to some data, 
>> following the procedure on Bolker's Ecological Models and 
>> Data in R, but I'm having some problem... could someone 
>> please point me out what I'm doing  wrong?
>> 
>> here is the code
>> 
>> #create a random variable 
>> x <- sort(rpareto(100,1,0.5))
>> 
>> #the pdf
>> pdf.pareto <- function(x,location,shape){
>> 	density <- -(location+1)*(location^(shape+1))*x^shape
>> 	return(density)}
>> 
>> #predict the probability according to my pdf
>> y <- pdf.pareto(x,1,0.5)
>> 
>> #check that pdf function does what it should
>> plot(x,y,typ="l")
>> 
>> #the negative log-likelihood function
>> NLL.pareto <- function(location,shape){
>> 	density <- pdf.pareto(x,location,shape)
>> 	-sum(dpareto(density,location=location,shape=shape,log=TRUE))}
>> 
>> #run the optimisation with mle2 to derive the parameters of the pdf
>> fit.pareto <- mle2(NLL.pareto,start=list(location=1,shape=0.5))
>> 
>> 
>> when i run it, I get this message:
>> Error in optim(par = c(1, 0.5), fn = function (p)  : 
>>  initial value in 'vmmin' is not finite
>> 
>> I was almost convinced that the reasoning behind was correct, 
>> and to show it to myself i did the same thing with a less 
>> "exotic" distribution. In that case the code doesn't crash 
>> but the estimate i get are pretty weird...I'm starting to 
>> think that I'm trying to do something totally wrong...  
>> really appreciate any help
>> thanks
>> matteo
>> 
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology