[R] Gamma distribution parameter estimation

Sun Aug 7 09:17:56 CEST 2011

Well, 'Alexander Engelhardt' failed to follow the posting guide, and 
so did not get a reply from some knowledgeable people.

(1) When using fitdistr you need to roughly scale your data: e.g. 
divide by 1000 here.

(2) fitdistr does 'go down the maximum likeilhood route' and it does 
find reasonable starting points.

The mean is *not* the mle for the rate, and Mr Harding's gamma pdf is 
parametrized by 'scale' not 'rate' (see the R help page).  Probably 
not its intention, but Harding's posting with its schoolboy errors 
exemplifies the inadvisibility of 'rolling your own'.

(3) In the same package there is a function gamma.shape to find the 
mle of the shape given the mean.  And the package is support for a 
book, and this should be obvious from the book's index.

(4) A value of 2750 for a shape parameter is totally implausible. 
Have you any idea at all what gamma(shape = 2750) looks like?

MASS's function gives

b <- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L ,1699L, 
4545L)
> fit <- glm(b ~ 1, family=Gamma()
> coef(fit)
  (Intercept)
0.0003391958
> gamma.shape(fit)

Alpha: 7.786827
SE:    2.411487

At that point you have mles of 1/mean and shape.  The mle of the 
rate is then
> 0.0003391958*7.786827
[1] 0.002641259

It's so much easier to use fitdistr:

> fitdistr(b/1000, "gamma")
      shape       rate
   7.7871427   2.6413668
  (2.4115976) (0.8449421)

And as a check of plausibility:
dx <- seq(0, 6000, len = 100)
plot(dx, dgamma(dx, shape=7.8, rate=2.6e-3), type = "l")
rug(b)

On Sun, 7 Aug 2011, ted.harding at wlandres.net wrote:

> On 06-Aug-11 19:37:49, Alexander Engelhardt wrote:
>> On 08/06/2011 09:23 PM, Jorge Ivan Velez wrote:
>>> Hi Alex,
>>>
>>> Try
>>>
>>>> require(MASS)
>>> Loading required package: MASS
>>>> b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
>>> + 3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L,
>>> + 1699L, 4545L)
>>>> fitdistr(b, 'gamma')
>>>        shape           rate
>>>    6.4528939045   0.0021887943
>>>   (0.7722567310) (0.0001854559)
>>>
>>
>> Hi,
>>
>> I'm getting a weird error here, pasted below. Do you know what
>> could cause that?
>>
>> > fitdistr(b,"gamma")
>> Error in optim(x = c(2039L, 957L, 2088L, 5966L, 2307L, 2044L,
>> 2353L, 1966L,  :
>>    non-finite finite-difference value [2]
>> In addition: Warning message:
>> In dgamma(x, shape, scale, log) : NaNs produced
>
> I'm surprised no=-one has suggested going straight down the
> maximum-likelihood route. The solution for the 'rate' parameter
> is immediate, while the solution for the 'shape' paramater
> needs the solution of an equation in one variable imvolving
> the digamma function psi(k).
>
> Writing the density of the gamma distribution as
>
>  (1/G(k))*(x^(k-1))*exp(x/r)/(r^k)
>
> where k is the shape parameter and r is the rate, and
> G(k) denotes the gamma function Gamma(k), we have for
> the estimates
>
> [1]  r = mean(x.i)
>
> [2]  psi(k) = mean(log(x.i))
>
> where {x.1,x.2,...,x.n} is the sample, and psi(k) denotes
> the digamma function (G'(k))/G(k) = (log(G(k))' (the "'"
> denotes first derivative).
>
> So you just need to solve [2] for k.
>
> This is well summarised in the Wikipedia article:
>
> http://en.wikipedia.org/wiki/
> Gamma_distribution#Maximum_likelihood_estimation
>
> in the Section "Parameter estimation".
>
> Since the digamma function digamma() is in the base package,
> all you need to do is submit it to a root-finder, here done
> in the naivest possible way.
>
> With your values of
>
>  x<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L,
>        2013L, 3376L, 3363L, 3567L, 4798L, 2032L, 1699L,
>        3001L, 2329L, 3944L, 2568L, 1699L, 4545L)
>
> you get
>
>  mean(log(x))
>  # [1] 7.92335
>
> which is quite a modest number (though in a somewhat tense
> relationship with the digamma function).
>
> A bit of experimentation with plot() will lead you home:
>
>  t<-(50:200);     plot(t,digamma(t))
>  t<-(100:2000);   plot(t,digamma(t))
>  t<-20*(100:200); plot(t,digamma(t))
>
> from which the root is somewhere near 2750.
>
>  t<-(2700:2800); plot(t,digamma(t))
>  t<-(2745:2865); plot(t,digamma(t))
>  t<-2755+0.1*(0:100); plot(t,digamma(t))
>
> So:
>
>  mean(log(x)) - digamma(2760)
>  # [1] 0.0005452383
>  mean(log(x)) - digamma(2769)
>  # [1] -0.002710915
>  mean(log(x)) - digamma(2765)
>  # [1] -0.001265045
>  mean(log(x)) - digamma(2763)
>  # [1] -0.0005413247
>  mean(log(x)) - digamma(2762)
>  # [1] -0.0001792682
>  mean(log(x)) - digamma(2761)
>  # [1] 0.0001829194
>  mean(log(x)) - digamma(2761.5)
>  # [1] 1.809230e-06
>  mean(log(x)) - digamma(2761.75)
>  # [1] -8.873357e-05
>  mean(log(x)) - digamma(2761.625)
>  # [1] -4.34632e-05
>  mean(log(x)) - digamma(2761.5625)
>  # [1] -2.082724e-05
>  mean(log(x)) - digamma(2761.53125)
>  # [1] -9.509068e-06
>  mean(log(x)) - digamma(2761.515625)
>  # [1] -3.849935e-06
>  mean(log(x)) - digamma(2761.5078125)
>  # [1] -1.020356e-06
>  mean(log(x)) - digamma(2761.50390625)
>  # [1] 3.944361e-07
>  mean(log(x)) - digamma(2761.505859375)
>  # [1] -3.129603e-07
>  mean(log(x)) - digamma(2761.5048828125)
>  # [1] 4.073787e-08
>
> And so it goes ... Anyway, to 7 significant figures,
> the answer is that k = 2761.505, and that is "by hand"
> using a simple "interval-halving" procedure.
>
> I don't know why fitdistr() threw an error. It may be
> that the starting-value for the iterations was to far
> off and threw it into some difficult space. The good
> starting value above was obtained by a bit of graphical
> trial and error. Maybe something similar can be emulated
> in R?
>
> Hoping this helps,
> Ted.
>
> PS
> Oh, and while I'm at it, the estimate of the rate is
>
>  r = mean(x) = 2948.15
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 07-Aug-11                                       Time: 02:03:29
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595