[R] Fitdistr and mle
Ben Bolker
bbolker at gmail.com
Thu Dec 26 15:39:01 CET 2013
On 13-12-26 04:13 AM, Tia Borrelli wrote:
>
> Thank you, this is code i'm running, very simple but my problem was on
> the interpretation of the difference between the functions.
>
> library(fImport)
> data.oggi = Sys.timeDate()
> ftse_mib = yahooSeries("FTSEMIB.MI", from="2009-09-01", to=data.oggi)
> portaf <- ftse_mib[,6]
> mrk_ftse <- portaf$FTSEMIB.MI.Adj.Close
> returns(mrk_ftse)
> library(quantmod)
> ret <- dailyReturn(portaf[,1])
> library(MASS)
> fitting <- fitdistr(ret,densfun="normal")
> print(c(mean(ret),sd(ret)))
> library(stats4)
> loglik <- function(media=0, devstd=1){
> -sum(dnorm(ret, mean=media, sd=devstd, log=TRUE))
> }
> mle(loglik)
>
Thank you.
I was wrong: a closer look inside the guts of MASS::fitdistr (which
you can do yourself) reveals that for densfun="normal" it is in fact
just calculating the mean and standard deviation of the data (and
scaling the standard deviation to convert from the usual unbiased
estimate of the standard deviation to the MLE). In contrast, mle() is
using numerical optimization.
If you look at the coefficients:
fitdistr mle
mean -3.270946e-06 -2.332673e-06
sd 1.692252e-02 1.697018e-02
and the standard errors shown by fitdistr
mean sd
( 5.102331e-04) ( 3.607893e-04)
you'll see that although the mle and fitdistr estimate of the mean
vary *relatively* (the magnitude of the fitdistr estimate is 50% larger
than that from mle), relative to the standard errors they are tiny (the
differences in the mean estimate are two orders of magnitude smaller
than the standard error of the mean). It is a fact of life when you're
doing numerical optimization that you're never going to get exactly the
right answer, due to various forms of numerical "fuzz" -- you just have
to know enough about the methods, and about your problem, to know
whether the answers are correct within sensible error bounds.
>
> Il Mercoledì 25 Dicembre 2013 17:49, Ben Bolker <bbolker at gmail.com> ha
> scritto:
> Tia Borrelli <tiaborrelli <at> yahoo.it> writes:
>
>> Thanks for answering, in ret i've the returns of FTSE MIB (the
>> benchmark stock market index in Italy) and i'm estimating the
>> parametres of the distribution of the returns of the index using
>> different methods.
>
> OK, but this still isn't a *reproducible* example (see e.g.
> http://tinyurl.com/reproducible-000 <http://tinyurl.com/reproducible-000>)
>
>> I need the mle and i found this two function and i could not
>> understand why the result were different: it's possibile that i
>> obtain different result because in the mle() i don't need to know
>> the original distribution and in the fitdistr() i don't need to know
>> the function i had to maximize?
>
> In your example fitdistr() and mle() are doing the same thing under
> the hood, i.e. using the built-in optim() function to minimize a
> negative log-likelihood function based on the built-in dnorm().
> fitdistr() picks the distribution for you based on your specification
> of which distribution to use; mle() requires you to specify the
> negative log-likelihood function (the mle2() function in the bbmle
> package is an extension of stats4::mle that offers a middle ground,
> e.g. you can say y ~ dnorm(mu,sigma) to specify the fit of a Normal
> distribution). The differences between the results you get will be
> based on small numerical differences, e.g. the starting values of the
> parameters, or differences in the control parameters for optimization.
> In general you should get very similar, but not necessarily identical,
> answers from these two functions; big differences would probably
> indicate some kind of wonky data or numerical problem. Again, we
> would need a reproducible example to see precisely what is going on.
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list