[R] Gamma distribution parameter estimation
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Aug 7 09:17:56 CEST 2011
Well, 'Alexander Engelhardt' failed to follow the posting guide, and
so did not get a reply from some knowledgeable people.
(1) When using fitdistr you need to roughly scale your data: e.g.
divide by 1000 here.
(2) fitdistr does 'go down the maximum likeilhood route' and it does
find reasonable starting points.
The mean is *not* the mle for the rate, and Mr Harding's gamma pdf is
parametrized by 'scale' not 'rate' (see the R help page). Probably
not its intention, but Harding's posting with its schoolboy errors
exemplifies the inadvisibility of 'rolling your own'.
(3) In the same package there is a function gamma.shape to find the
mle of the shape given the mean. And the package is support for a
book, and this should be obvious from the book's index.
(4) A value of 2750 for a shape parameter is totally implausible.
Have you any idea at all what gamma(shape = 2750) looks like?
MASS's function gives
b <- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L ,1699L,
4545L)
> fit <- glm(b ~ 1, family=Gamma()
> coef(fit)
(Intercept)
0.0003391958
> gamma.shape(fit)
Alpha: 7.786827
SE: 2.411487
At that point you have mles of 1/mean and shape. The mle of the
rate is then
> 0.0003391958*7.786827
[1] 0.002641259
It's so much easier to use fitdistr:
> fitdistr(b/1000, "gamma")
shape rate
7.7871427 2.6413668
(2.4115976) (0.8449421)
And as a check of plausibility:
dx <- seq(0, 6000, len = 100)
plot(dx, dgamma(dx, shape=7.8, rate=2.6e-3), type = "l")
rug(b)
On Sun, 7 Aug 2011, ted.harding at wlandres.net wrote:
> On 06-Aug-11 19:37:49, Alexander Engelhardt wrote:
>> On 08/06/2011 09:23 PM, Jorge Ivan Velez wrote:
>>> Hi Alex,
>>>
>>> Try
>>>
>>>> require(MASS)
>>> Loading required package: MASS
>>>> b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
>>> + 3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L,
>>> + 1699L, 4545L)
>>>> fitdistr(b, 'gamma')
>>> shape rate
>>> 6.4528939045 0.0021887943
>>> (0.7722567310) (0.0001854559)
>>>
>>
>> Hi,
>>
>> I'm getting a weird error here, pasted below. Do you know what
>> could cause that?
>>
>> > fitdistr(b,"gamma")
>> Error in optim(x = c(2039L, 957L, 2088L, 5966L, 2307L, 2044L,
>> 2353L, 1966L, :
>> non-finite finite-difference value [2]
>> In addition: Warning message:
>> In dgamma(x, shape, scale, log) : NaNs produced
>
> I'm surprised no=-one has suggested going straight down the
> maximum-likelihood route. The solution for the 'rate' parameter
> is immediate, while the solution for the 'shape' paramater
> needs the solution of an equation in one variable imvolving
> the digamma function psi(k).
>
> Writing the density of the gamma distribution as
>
> (1/G(k))*(x^(k-1))*exp(x/r)/(r^k)
>
> where k is the shape parameter and r is the rate, and
> G(k) denotes the gamma function Gamma(k), we have for
> the estimates
>
> [1] r = mean(x.i)
>
> [2] psi(k) = mean(log(x.i))
>
> where {x.1,x.2,...,x.n} is the sample, and psi(k) denotes
> the digamma function (G'(k))/G(k) = (log(G(k))' (the "'"
> denotes first derivative).
>
> So you just need to solve [2] for k.
>
> This is well summarised in the Wikipedia article:
>
> http://en.wikipedia.org/wiki/
> Gamma_distribution#Maximum_likelihood_estimation
>
> in the Section "Parameter estimation".
>
> Since the digamma function digamma() is in the base package,
> all you need to do is submit it to a root-finder, here done
> in the naivest possible way.
>
> With your values of
>
> x<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L,
> 2013L, 3376L, 3363L, 3567L, 4798L, 2032L, 1699L,
> 3001L, 2329L, 3944L, 2568L, 1699L, 4545L)
>
> you get
>
> mean(log(x))
> # [1] 7.92335
>
> which is quite a modest number (though in a somewhat tense
> relationship with the digamma function).
>
> A bit of experimentation with plot() will lead you home:
>
> t<-(50:200); plot(t,digamma(t))
> t<-(100:2000); plot(t,digamma(t))
> t<-20*(100:200); plot(t,digamma(t))
>
> from which the root is somewhere near 2750.
>
> t<-(2700:2800); plot(t,digamma(t))
> t<-(2745:2865); plot(t,digamma(t))
> t<-2755+0.1*(0:100); plot(t,digamma(t))
>
> So:
>
> mean(log(x)) - digamma(2760)
> # [1] 0.0005452383
> mean(log(x)) - digamma(2769)
> # [1] -0.002710915
> mean(log(x)) - digamma(2765)
> # [1] -0.001265045
> mean(log(x)) - digamma(2763)
> # [1] -0.0005413247
> mean(log(x)) - digamma(2762)
> # [1] -0.0001792682
> mean(log(x)) - digamma(2761)
> # [1] 0.0001829194
> mean(log(x)) - digamma(2761.5)
> # [1] 1.809230e-06
> mean(log(x)) - digamma(2761.75)
> # [1] -8.873357e-05
> mean(log(x)) - digamma(2761.625)
> # [1] -4.34632e-05
> mean(log(x)) - digamma(2761.5625)
> # [1] -2.082724e-05
> mean(log(x)) - digamma(2761.53125)
> # [1] -9.509068e-06
> mean(log(x)) - digamma(2761.515625)
> # [1] -3.849935e-06
> mean(log(x)) - digamma(2761.5078125)
> # [1] -1.020356e-06
> mean(log(x)) - digamma(2761.50390625)
> # [1] 3.944361e-07
> mean(log(x)) - digamma(2761.505859375)
> # [1] -3.129603e-07
> mean(log(x)) - digamma(2761.5048828125)
> # [1] 4.073787e-08
>
> And so it goes ... Anyway, to 7 significant figures,
> the answer is that k = 2761.505, and that is "by hand"
> using a simple "interval-halving" procedure.
>
> I don't know why fitdistr() threw an error. It may be
> that the starting-value for the iterations was to far
> off and threw it into some difficult space. The good
> starting value above was obtained by a bit of graphical
> trial and error. Maybe something similar can be emulated
> in R?
>
> Hoping this helps,
> Ted.
>
> PS
> Oh, and while I'm at it, the estimate of the rate is
>
> r = mean(x) = 2948.15
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 07-Aug-11 Time: 02:03:29
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list