[R] Gamma distribution parameter estimation

(Ted Harding) ted.harding at wlandres.net
Sun Aug 7 03:03:32 CEST 2011


On 06-Aug-11 19:37:49, Alexander Engelhardt wrote:
> On 08/06/2011 09:23 PM, Jorge Ivan Velez wrote:
>> Hi Alex,
>>
>> Try
>>
>>> require(MASS)
>> Loading required package: MASS
>>> b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
>> + 3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L,
>> + 1699L, 4545L)
>>> fitdistr(b, 'gamma')
>>        shape           rate
>>    6.4528939045   0.0021887943
>>   (0.7722567310) (0.0001854559)
>>
> 
> Hi,
> 
> I'm getting a weird error here, pasted below. Do you know what
> could cause that?
> 
>  > fitdistr(b,"gamma")
> Error in optim(x = c(2039L, 957L, 2088L, 5966L, 2307L, 2044L,
> 2353L, 1966L,  :
>    non-finite finite-difference value [2]
> In addition: Warning message:
> In dgamma(x, shape, scale, log) : NaNs produced

I'm surprised no=-one has suggested going straight down the
maximum-likelihood route. The solution for the 'rate' parameter
is immediate, while the solution for the 'shape' paramater
needs the solution of an equation in one variable imvolving
the digamma function psi(k).

Writing the density of the gamma distribution as

  (1/G(k))*(x^(k-1))*exp(x/r)/(r^k)

where k is the shape parameter and r is the rate, and
G(k) denotes the gamma function Gamma(k), we have for
the estimates

[1]  r = mean(x.i)

[2]  psi(k) = mean(log(x.i))

where {x.1,x.2,...,x.n} is the sample, and psi(k) denotes
the digamma function (G'(k))/G(k) = (log(G(k))' (the "'"
denotes first derivative).

So you just need to solve [2] for k.

This is well summarised in the Wikipedia article:

http://en.wikipedia.org/wiki/
Gamma_distribution#Maximum_likelihood_estimation

in the Section "Parameter estimation".

Since the digamma function digamma() is in the base package,
all you need to do is submit it to a root-finder, here done
in the naivest possible way.

With your values of

  x<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L,
        2013L, 3376L, 3363L, 3567L, 4798L, 2032L, 1699L,
        3001L, 2329L, 3944L, 2568L, 1699L, 4545L)

you get

  mean(log(x))
  # [1] 7.92335

which is quite a modest number (though in a somewhat tense
relationship with the digamma function).

A bit of experimentation with plot() will lead you home:

  t<-(50:200);     plot(t,digamma(t))
  t<-(100:2000);   plot(t,digamma(t))
  t<-20*(100:200); plot(t,digamma(t))

from which the root is somewhere near 2750.

  t<-(2700:2800); plot(t,digamma(t))
  t<-(2745:2865); plot(t,digamma(t))
  t<-2755+0.1*(0:100); plot(t,digamma(t))

So:

  mean(log(x)) - digamma(2760)
  # [1] 0.0005452383
  mean(log(x)) - digamma(2769)
  # [1] -0.002710915
  mean(log(x)) - digamma(2765)
  # [1] -0.001265045
  mean(log(x)) - digamma(2763)
  # [1] -0.0005413247
  mean(log(x)) - digamma(2762)
  # [1] -0.0001792682
  mean(log(x)) - digamma(2761)
  # [1] 0.0001829194
  mean(log(x)) - digamma(2761.5)
  # [1] 1.809230e-06
  mean(log(x)) - digamma(2761.75)
  # [1] -8.873357e-05
  mean(log(x)) - digamma(2761.625)
  # [1] -4.34632e-05
  mean(log(x)) - digamma(2761.5625)
  # [1] -2.082724e-05
  mean(log(x)) - digamma(2761.53125)
  # [1] -9.509068e-06
  mean(log(x)) - digamma(2761.515625)
  # [1] -3.849935e-06
  mean(log(x)) - digamma(2761.5078125)
  # [1] -1.020356e-06
  mean(log(x)) - digamma(2761.50390625)
  # [1] 3.944361e-07
  mean(log(x)) - digamma(2761.505859375)
  # [1] -3.129603e-07
  mean(log(x)) - digamma(2761.5048828125)
  # [1] 4.073787e-08

And so it goes ... Anyway, to 7 significant figures,
the answer is that k = 2761.505, and that is "by hand"
using a simple "interval-halving" procedure.

I don't know why fitdistr() threw an error. It may be
that the starting-value for the iterations was to far
off and threw it into some difficult space. The good
starting value above was obtained by a bit of graphical
trial and error. Maybe something similar can be emulated
in R?

Hoping this helps,
Ted.

PS
Oh, and while I'm at it, the estimate of the rate is

  r = mean(x) = 2948.15


--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-11                                       Time: 02:03:29
------------------------------ XFMail ------------------------------



More information about the R-help mailing list