[R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

Alex Olssen alex.olssen at gmail.com
Mon May 9 22:47:28 CEST 2011


Wow that is really interesting,

Sorry I was asleep when you emailed these.

And yes, of course, I had been trying to implement model 18, not 18s,
that was a typo, sorry.

I will have a look at the code you posted.

Thanks,

Alex

On 10 May 2011 02:18, peter dalgaard <pdalgd at gmail.com> wrote:
> Hmm, I tried replacing the x's in the model with their principal component scores, and suddenly everything converges as a greased lightning:
>
>> Z <- princomp(cbind(x1,x2,x3))$scores
>> Z <- as.data.frame(princomp(cbind(x1,x2,x3))$scores)
>> names(Z)<- letters[1:3]
>> optimx(rep(0,8),lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=Z$a, x2=Z$b, x3=Z$c, method="BFGS")
>                                                                                                par
> 1 0.59107682, 0.01043568, -0.20639153, 0.25902746, 0.24675162, -0.01426477, 0.18045177, -0.23657327
>    fvalues method fns grs itns conv KKT1 KKT2 xtimes
> 1 -76.73878   BFGS 157  37 NULL    0 TRUE TRUE  0.055
> Warning messages:
> 1: In max(logpar) : no non-missing arguments to max; returning -Inf
> 2: In min(logpar) : no non-missing arguments to min; returning Inf
>
> The likelihood appears to be spot on, but, obviously, the parameter estimates need to be back-transformed to the original x1,x2,x3 system. I'll leave that as the proverbial "exercise for the reader"...
>
> However, the whole thing leads me to a suspicion that maybe our numerical gradients are misbehaving in cases with high local collinearity?
>
> -pd
>
> On May 9, 2011, at 13:40 , Alex Olssen wrote:
>
>> Hi Mike,
>>
>> Mike said
>> "is this it, page 1559?"
>>
>> That is the front page yes, page 15*6*9 has the table, of which the
>> model labelled 18s is the one I replicated.
>>
>> "did you actually cut/paste code anywhere and is your first
>> coefficient -.19 or -.019?
>> Presumably typos would be one possible problem."
>>
>> -0.19 is not a typo, it is pi10 in the paper, and a1 in my Stata
>> estimation - as far as I can tell cutting and pasting is not the
>> problem.
>>
>> "generally it helps if we could at least see the equations to check your
>> code against typos ( note page number ?) in lnl that may fix part of the
>> mystery.  Is full text available
>> on author's site, doesn't come up on citeseer AFAICT,"
>>
>> Unfortunately I do not know of any place to get the full version of
>> this paper that doesn't require access to a database such as JSTOR.
>> The fact that this likelihood function reproduces the published
>> results in Stata makes me confident that it is correct - I also have
>> read a lot on systems estimation and it is a pretty standard
>> likelihood function.
>>
>> "I guess one question would be " what is beta" in lnl supposed to be -
>> it isn't used anywhere but I will also mentioned I'm not that familiar
>> with the R code ( I'm trying to work through this to learn R and the
>> optimizers).
>>
>> maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
>> e1 and e2 supposed to be?"
>>
>> The code above is certainly not as elegant as it could be - in this
>> case the beta is rather superfluous.  It is just a vector to hold
>> parameters - but since I called all the parameters theta anyway there
>> is no need for it.  e1 and e2 are the residuals from the first and
>> second equations of the system.  Sigma is a 2x2 matrix which is the
>> outer product of the two vectors of residuals.
>>
>> Kind regards,
>>
>> Alex
>>
>>
>>
>> On 9 May 2011 23:12, Mike Marchywka <marchywka at hotmail.com> wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>> ----------------------------------------
>>>> Date: Mon, 9 May 2011 22:06:38 +1200
>>>> From: alex.olssen at gmail.com
>>>> To: pdalgd at gmail.com
>>>> CC: r-help at r-project.org; davef at otter-rsch.com
>>>> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
>>>>
>>>> Peter said
>>>>
>>>> "Ahem! You might get us interested in your problem, but not to the
>>>> level that we are going to install Stata and Tsp and actually dig out
>>>> and study the scientific paper you are talking about. Please cite the
>>>> results and explain the differences."
>>>>
>>>> Apologies Peter, will do,
>>>>
>>>> The results which I can emulate in Stata but not (yet) in R are reported below.
>>>
>>> did you actually cut/paste code anywhere and is your first coefficient -.19 or -.019?
>>> Presumably typos would be one possible problem.
>>>
>>>> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569
>>>
>>>
>>> is this it, page 1559?
>>>
>>> http://www.jstor.org/pss/1913396
>>>
>>> generally it helps if we could at least see the equations to check your
>>> code against typos ( note page number ?) in lnl that may fix part of the
>>> mystery.  Is full text available
>>> on author's site, doesn't come up on citeseer AFAICT,
>>>
>>>
>>> http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate
>>>
>>> I guess one question would be " what is beta" in lnl supposed to be -
>>> it isn't used anywhere but I will also mentioned I'm not that familiar
>>> with the R code ( I'm trying to work through this to learn R and the optimizers).
>>>
>>> maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
>>> e1 and e2 supposed to be?
>>>
>>>
>>>
>>>
>>>>
>>>> TABLE II - model 18s
>>>>
>>>> coef std err
>>>> p10 -0.19 0.078
>>>> p11 0.220 0.019
>>>> p12 -0.148 0.021
>>>> p13 -0.072
>>>> p20 0.893 0.072
>>>> p21 -0.148
>>>> p22 0.050 0.035
>>>> p23 0.098
>>>>
>>>> The results which I produced in Stata are reported below.
>>>> I spent the last hour rewriting the code to reproduce this - since I
>>>> am now at home and not at work :(
>>>> My results are "identical" to those published. The estimates are for
>>>> a 3 equation symmetrical singular system.
>>>> I have not bothered to report symmetrical results and have backed out
>>>> an extra estimate using adding up constraints.
>>>> I have also backed out all standard errors using the delta method.
>>>>
>>>> . ereturn display
>>>> ------------------------------------------------------------------------------
>>>> | Coef. Std. Err. z P>|z| [95% Conf. Interval]
>>>> -------------+----------------------------------------------------------------
>>>> a |
>>>> a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
>>>> a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
>>>> a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
>>>> -------------+----------------------------------------------------------------
>>>> g |
>>>> g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
>>>> g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
>>>> g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
>>>> g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
>>>> g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
>>>> g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
>>>> ------------------------------------------------------------------------------
>>>>
>>>> In R I cannot get results like this - I think it is probably to do
>>>> with my inability at using the optimisers well.
>>>> Any pointers would be appreciated.
>>>>
>>>> Peter said "Are we maximizing over the same parameter space? You say
>>>> that the estimates from the paper gives a log-likelihood of 54.04, but
>>>> the exact solution clocked in at 76.74, which in my book is rather
>>>> larger."
>>>>
>>>> I meant +54.04 > -76.74. It is quite common to get positive
>>>> log-likelihoods in these system estimation.
>>>>
>>>> Kind regards,
>>>>
>>>> Alex
>>>>
>>>> On 9 May 2011 19:04, peter dalgaard  wrote:
>>>>>
>>>>> On May 9, 2011, at 06:07 , Alex Olssen wrote:
>>>>>
>>>>>> Thank you all for your input.
>>>>>>
>>>>>> Unfortunately my problem is not yet resolved.  Before I respond to
>>>>>> individual comments I make a clarification:
>>>>>>
>>>>>> In Stata, using the same likelihood function as above, I can reproduce
>>>>>> EXACTLY (to 3 decimal places or more, which is exactly considering I
>>>>>> am using different software) the results from model 8 of the paper.
>>>>>>
>>>>>> I take this as an indication that I am using the same likelihood
>>>>>> function as the authors, and that it does indeed work.
>>>>>> The reason I am trying to estimate the model in R is because while
>>>>>> Stata reproduces model 8 perfectly it has convergence
>>>>>> difficulties for some of the other models.
>>>>>>
>>>>>> Peter Dalgaard,
>>>>>>
>>>>>> "Better starting values would help. In this case, almost too good
>>>>>> values are available:
>>>>>>
>>>>>> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>>>>>>
>>>>>> which appears to be the _exact_ solution."
>>>>>>
>>>>>> Thanks for the suggestion.  Using these starting values produces the
>>>>>> exact estimate that Dave Fournier emailed me.
>>>>>> If these are the exact solution then why did the author publish
>>>>>> different answers which are completely reproducible in
>>>>>> Stata and Tsp?
>>>>>
>>>>>
>>>>> Ahem! You might get us interested in your problem, but not to the level that we are going to install Stata and Tsp and actually dig out and study the scientific paper you are talking about. Please cite the results and explain the differences.
>>>>>
>>>>> Are we maximizing over the same parameter space? You say that the estimates from the paper gives a log-likelihood of 54.04, but the exact solution clocked in at 76.74, which in my book is rather larger.
>>>>>
>>>>> Confused....
>>>>>
>>>>> -p
>>>>>
>>>>>
>>>>>>
>>>>>> Ravi,
>>>>>>
>>>>>> Thanks for introducing optimx to me, I am new to R.  I completely
>>>>>> agree that you can get higher log-likelihood values
>>>>>> than what those obtained with optim and the starting values suggested
>>>>>> by Peter.  In fact, in Stata, when I reproduce
>>>>>> the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.
>>>>>>
>>>>>> Furthermore if I estimate model 8 without symmetry imposed on the
>>>>>> system I reproduce the Likelihood Ratio reported
>>>>>> in the paper to 3 decimal places as well, suggesting that the
>>>>>> log-likelihoods I am reporting differ from those in the paper
>>>>>> only due to a constant.
>>>>>>
>>>>>> Thanks for your comments,
>>>>>>
>>>>>> I am still highly interested in knowing why the results of the
>>>>>> optimisation in R are so different to those in Stata?
>>>>>>
>>>>>> I might try making my convergence requirements more stringent.
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Alex
>>>>>
>>>>> --
>>>>> Peter Dalgaard
>>>>> Center for Statistics, Copenhagen Business School
>>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>>> Phone: (+45)38153501
>>>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>



More information about the R-help mailing list