[R] distribution fitting
Tortonesi Mauro
mauro.tortonesi at unife.it
Thu Oct 23 13:40:45 CEST 2008
Dear R-help readers,
I am writing to you in order to ask you a few questions about
distribution fitting in R.
I am trying to find out whether the set of event interarrival times
that I am currently analyzing is distributed with a Gamma or General
Pareto distribution. The event arrival granularity is in minutes and
interarrival times are in seconds, so the values I have are 0, 60,
120, 180, and so on...
Here is what I did. Since several values of the interarrival_times
array are zero, to avoid numerical errors when calculating their
logarithm, I set them to 1E-10:
> # fix numerical values
> for (i in 1:length(interarrival_times)) {
+ if (interarrival_times[i] == 0)
+ interarrival_times[i] = 0.0000000001
+ }
Then I defined the negative log likelihood for the Gamma distribution:
> nll_gamma_log <- function(lambda_log, alfa_log) {
+ n <- length(interarrival_times)
+ x <- interarrival_times
+ -n*exp(alfa_log)*lambda_log + n*log(gamma(exp(alfa_log))) -
(exp(alfa_log)-1)*sum(log(x)) + exp(lambda_log)*sum(x)
+ }
and passed it to the mle function:
> est_gamma <- mle(minuslog = nll_gamma_log, start = list(lambda_log = 0, alfa_log=0))
There were 50 or more warnings (use warnings() to see the first 50)
of course I got many "value out of range" warnings:
> warnings()
Warning messages:
1: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn'
[...]
7: In gamma(exp(log_alfa)) ... : NaNs produced
[...]
50: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn'
but that was expected, right?
The real problems come out when I try to fit a General Pareto
distribution. Using the same method, I calculated the negative log
likelihood function for the General Pareto distribution:
> # negative log-likelihood function for general pareto distribution
> nll_gpd <- function(xi, log_sigma, mu) { # shape, scale, location
+ n <- length(interarrival_times)
+ x <- interarrival_times
+ cat("xi:",xi,"; log_sigma:",log_sigma,"; mu:",mu,"\n")
+ n*log_sigma + (xi+1)*sum(log(1+xi*(x-mu)/exp(log_sigma)))/xi
+ }
and passed it to the mle function. However, this time I get some errors:
> est_gpd <- mle(minuslog = nll_gpd, start = list(xi = 1, log_sigma = 1, mu = 1))
xi: 1 ; log_sigma: 1 ; mu: 1
xi: 1.001 ; log_sigma: 1 ; mu: 1
xi: 0.999 ; log_sigma: 1 ; mu: 1
xi: 1 ; log_sigma: 1.001 ; mu: 1
xi: 1 ; log_sigma: 0.999 ; mu: 1
xi: 1 ; log_sigma: 1 ; mu: 1.001
xi: 1 ; log_sigma: 1 ; mu: 0.999
xi: 52007.84 ; log_sigma: 13414.95 ; mu: 4241.565
xi: 10402.37 ; log_sigma: 2683.789 ; mu: 849.113
xi: 18.08721 ; log_sigma: 3.862682 ; mu: 2.627865
xi: 18.08721 ; log_sigma: 3.860682 ; mu: 2.627865
Error in optim(start, f, method = method, hessian = TRUE, ...) :
non-finite finite-difference value [2]
In addition: There were 50 or more warnings (use warnings() to see the first 50)
I also get many warnings:
> warnings()
Warning messages:
1: In base::log(x, base) ... : NaNs produced
2: In base::log(x, base) ... : NaNs produced
[...]
50: In base::log(x, base) ... : NaNs produced
I really don't know how to fix this problem. Do you have any suggestions?
Thank you very much in advance.
Best regards,
Mauro Tortonesi
--
Mauro Tortonesi, Ph.D.
Research Assistant
Distributed Systems Research Group
Engineering Department
University of Ferrara
More information about the R-help
mailing list