[R] maximum likelihood accuracy - comparison with Stata

Wed Mar 30 14:55:49 CEST 2011

On 28 March 2011 17:08, Peter Ehlers <ehlers at ucalgary.ca> wrote:
> On 2011-03-27 21:37, Alex Olssen wrote:
>>
>> Hi everyone,
>>
>> I am looking to do some manual maximum likelihood estimation in R.  I
>> have done a lot of work in Stata and so I have been using output
>> comparisons to get a handle on what is happening.
>>
>> I estimated a simple linear model in R with   lm()   and also my own
>> maximum likelihood program.  I then compared the output with Stata.
>> Two things jumped out at me.
>>
>> Firstly, in Stata my coefficient estimates are identical in both the
>> OLS estimation by   -reg-  and the maximum likelihood estimation using
>> Stata's   ml lf  command.
>> In R my coefficient estimates differ slightly between   lm()   and my
>> own maximum likelihood estimation.
>>
>> Secondly, the estimates for   sigma2   are very different between R
>> and Stata.  3.14 in R compared to 1.78 in Stata.
>>
>> I have copied my maximum likelihood program below.  It is copied from
>> a great intro to MLE in R by Macro Steenbergen
>> http://artsci.wustl.edu/~jmonogan/computing/r/MLE_in_R.pdf
>>
>> Any comments are welcome.  In particular I would like to know why the
>> estimate of   sigma2   is so different.  I would also like to know
>> about the accuracy of the coefficient estimates.
>
> Some comments:
>
> 1.
> Since Kmenta is not in the datasets package, you should mention
> where you're getting it. (I suppose that for economists, it's
> obvious, but we're not all economists.) I used the version in
> the systemfit package.
>
> 2.
> I don't believe your Stata value for sigma2 (by which I assume
> you mean sigma^2). Are you quoting sigma?
>
> 3.
> I can't reproduce your R value of 3.14 for sigma2. I get 3.725391.
>
> 4.
> (most important) There's a difference between the simple ML
> estimate (which is biased) and R's unbiased estimate for sigma^2.
> This dataset has 20 cases, so try multiplying your result by 20/17.
>
> 5.
> As to any difference in coefficients, I would guess that the
> difference is slight (you don't show what it is); it
> may be due to the fact that optim() produces approximate
> solutions, whereas in this simple case, an 'exact' solution
> is possible (and found by R).
>
> 6.
> In case you weren't aware of it: the stats4 package has an
> mle() function.

... and there is the "maxLik" package.

http://cran.r-project.org/package=maxLik

http://www.springerlink.com/content/973512476428614p/

Best wishes from Copenhagen,
Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name