[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

Wed Jun 11 20:22:48 CEST 2008

Greetings, all

I am having difficulty getting the fitdistr() function to return without
an error on my data. Specifically, what I'm trying to do is get a
parameter estimation for fracture intensity data in a well / borehole.
Lower bound is 0 (no fractures in the selected data interval), and upper
bound is ~ 10 - 50, depending on what scale you are conducting the
analysis on.

I read in the data from a text file, convert it to numerics, and then
calculate initial estimates of the shape and scale parameters for the
gamma distribution from moments. I then feed this back into the
fitdistr() function.

R code (to this point):
########################################
data.raw=c(readLines("FSM_C_9m_ENE.inp"))
data.num <- as.numeric(data.raw)
data.num
library(MASS)
shape.mom = ((mean(data.num))/ (sd(data.num))^2
shape.mom
med.data = mean(data.num)
sd.data = sd(data.num)
med.data
sd.data
shape.mom = (med.data/sd.data)^2
shape.mom
scale.mom = (sd.data^2)/med.data
scale.mom
fitdistr(data.num,"gamma",list(shape=shape.mom,
scale=scale.mom),lower=0)

fitdistr() returns the following error:

" Error in optim(x = c(0.402707037, 0.403483333, 0.404383704,
2.432626667,  : 
  L-BFGS-B needs finite values of 'fn'"

Next thing I tried was to manually specify the negative log-likelihood
function and pass it straight to mle() (the method specified in Ricci's
tutorial on fitting distributions with R).  Basically, I got the same
result as using fitdistr().

Finally I tried using some R code I found from someone with a similar
problem back in 2003 from the archives of this mailing list:

R code
########################################
gamma.param1 <- shape.mom
gamma.param2 <- scale.mom
log.gamma.param1 <- log(gamma.param1)
log.gamma.param2 <- log(gamma.param2)

                                                       gammaLoglik <-
function(params, 
negative=TRUE){
   lglk <- sum(dgamma(data, shape=exp(params[1]), scale=exp(params[2]),
log=TRUE))
   if(negative)
      return(-lglk)
   else
      return(lglk)
}

optim.list <- optim(c(log.gamma.param1, log.gamma.param2), gammaLoglik)
gamma.param1 <- exp(optim.list$par[1])
gamma.param2 <- exp(optim.list$par[2])
#########################################

If I test this function using my sample data and the estimates of shape
and scale derived from the method of moments, gammaLogLike returns as
INF. I suspect the problem is that the zeros in the data are causing the
optim solver problems when it attempts to minimize the negative
log-likelihood function.

Can anyone suggest some advice on a work-around?  I have seen
suggestions online that a 'censoring' algorithm can allow one to use MLE
methods to estimate the gamma distribution for data with zero values
(Wilkes, 1990, Journal of Climate). I have not, however, found R code to
implement this, and, frankly, am not smart enough to do it myself... :-)

Any suggestions? Has anyone else run up against this and written code to
solve the problem?

Thanks in advance!

Aaron Fox
Senior Project Geologist, Golder Associates
+1 425 882 5484 || +1 425 736 3958 (mobile)
afox at golder.com || www.fracturedreservoirs.com