[R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

Mike Marchywka marchywka at hotmail.com
Sun May 22 02:40:44 CEST 2011












----------------------------------------
> From: rvaradhan at jhmi.edu
> To: marchywka at hotmail.com; pdalgd at gmail.com; alex.olssen at gmail.com
> CC: r-help at r-project.org
> Date: Sat, 21 May 2011 17:26:29 -0400
> Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
>
> Hi,
>
> I don't think the final verdict has been spoken. Peter's posts have hinted at ill-conditioning as the crux of the problem. So, I decided to try a couple of more things: (1) standardizing the covariates, (2) exact gradient, and (3) both (1) and (2).

Cool, I thought everyone lost interest. I'll get back to this then.
Before launching into this, I was curious however if the STATA
solution ( or whatever other proudct was used) was thought
to reprsent a good solution or the actual global optimum. 

IIRC, your earlier post along these lines, 

> # exact solution as the starting value
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

was what got me started. 


I guess my point was that the problem would not obviously
have a nice surface to optimize and IIRC the title of the paper
suggested the system was a bit difficult ( I won't pay for med papers,
not going to pay for econ LOL). The function to be minimized, and this
is stated as fact but only for sake of eliciting criticism, 
is the determinant of 2 unrelated residuals vectors. And, from memory,

> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3



this gives something like E=|e1|^2*|e2|^2*(1-Cos^2(d))  with
d being angle between residual vectors ( in space with dimension of number
of data points). Or, E=F*Sin^2(d) and depending on data is would seem
possible to move in such a way that F increases but just more slowly than
Sin^2(d). Any solution for which they are colinear would seem to be optimal.

I guess if my starting point is not too far off I may see if
I can find some diagnostics to determine is the data set creates
a condition as I have outlined and optimizer.

It might, for example, be interesting to see what happens to |e1||e2|
and Sin(d) at various points.
( T1 is just theta components 1-4 and T2 is 5-8, I guess presuming "x0"=1 LOL), 
E1=Y1-T1*X , E2=Y2-T2*x
E1.E2=Y1.Y2-Y1.(T2*X)-(T1*X).Y2+(T1*X).(T2*X)

I guess I keep working on the above and see if it points
to anything pathological or that doesn't play nice with
optimizer. 




>
> I compute the "exact" gradient using a complex-step derivative approach. This works just like the standard first-order, forward differencing. The only (but, essential) difference is that an imaginary increment, i*dx, is used. This, incredibly, gives exact gradients (up to machine precision).
>
> Here are the code and the results of my experiments:
>
> data <- read.table("h:/computations/optimx_example.dat", header=TRUE, sep=",")
> attach(data)
> require(optimx)
>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
> n <- length(y1)
> beta <- theta[1:8]
> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> e <- cbind(e1, e2)
> sigma <- t(e)%*%e
> det.sigma <- sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1]
> logl <- -1*n/2*(2*(1+log(2*pi)) + log(det.sigma))
> return(-logl)
> }
>
> csd <- function(fn, x, ...) {
> # Complex step derivatives; yields exact derivatives
> h <- .Machine$double.eps
> n <- length(x)
> h0 <- g <- rep(0, n)
> for (i in 1:n) {
> h0[i] <- h * 1i
> g[i] <- Im(fn(x+h0, ...))/h
> h0[i] <- 0
> }
> g
> }
>
> gr.csd <- function(theta,y1, y2, x1, x2, x3) {
> csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
> }
>
> # exact solution as the starting value
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> # numerical gradient
> p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> # exact gradient
> p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact gradient is used, we also see a major difference for L-BFGS-B
> # Exact gradient did not affect the gradient methods, CG and spg, much. However, convergence of Rcgmin improved when exact gradient was used
>
> x1s <- scale(x1)
> x2s <- scale(x2)
> x3s <- scale(x3)
>
> p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, control=list(all.methods=TRUE, maxit=1500))
>
> # both scaling and exact gradient
> p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, control=list(all.methods=TRUE, maxit=1500))
>
> # Comparing p3 and p3g, use of exact gradient improved spg dramatically[[elided Hotmail spam]]
>
> # Of course, derivative-free methods newuoa, and Nelder-Mead are improved by scaling, but not by the availability of exact gradients. I don't know what is wrong with bobyqa in this example.
>
> In short, even with scaling and exact gradients, this optimization problem is recalcitrant.
>
> Best,
> Ravi.
>
> ________________________________________
> From: Mike Marchywka [marchywka at hotmail.com]
> Sent: Thursday, May 12, 2011 8:30 AM
> To: Ravi Varadhan; pdalgd at gmail.com; alex.olssen at gmail.com
> Cc: r-help at r-project.org
> Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
>
> So what was the final verdict on this discussion? I kind of
> lost track if anyone has a minute to summarize and critique my summary below.
>
>
> Apparently there were two issues, the comparison between R and Stata
> was one issue and the "optimum" solution another. As I understand it,
> there was some question about R numerical gradient calculation. This would
> suggest some features of the function may be of interest to consider.
>
> The function to be optimized appears to be, as OP stated,
> some function of residuals of two ( unrelated ) fits.
> The residual vectors e1 and e2 are dotted in various combinations
> creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
> is the result to be minimized by choice of theta. Theta it seems is
> an 8 component vector, 4 components determine e1 and the other 4 e2.
> Presumably a unique solution would require that e1 and e2, both n-component vectors,
> point in different directions or else both could become aribtarily large
> while keeping the error signal at zero. For fixed magnitudes, colinearity
> would reduce the "Error." The intent would appear to be to
> keep the residuals distributed similarly in the two ( unrelated) fits.
> I guess my question is,
> " did anyone determine that there is a unique solution?" or
> am I totally wrong here ( I haven't used these myself to any
> extent and just try to run some simple teaching examples, asking
> for my own clarification as much as anything).
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------
> > From: rvaradhan at jhmi.edu
> > To: pdalgd at gmail.com; alex.olssen at gmail.com
> > Date: Sat, 7 May 2011 11:51:56 -0400
> > CC: r-help at r-project.org
> > Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
> >
> > There is something strange in this problem. I think the log-likelihood is incorrect. See the results below from "optimx". You can get much larger log-likelihood values than for the exact solution that Peter provided.
> >
> > ## model 18
> > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > n <- length(y1)
> > beta <- theta[1:8]
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > e <- cbind(e1, e2)
> > sigma <- t(e)%*%e
> > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is something wrong here
> > return(-logl)
> > }
> >
> > data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
> >
> > attach(data)
> >
> > require(optimx)
> >
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> >
> > # the warnings can be safely ignored in the "optimx" calls
> > p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > Ravi.
> > ________________________________________
> > From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of peter dalgaard [pdalgd at gmail.com]
> > Sent: Saturday, May 07, 2011 4:46 AM
> > To: Alex Olssen
> > Cc: r-help at r-project.org
> > Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
> >
> > On May 6, 2011, at 14:29 , Alex Olssen wrote:
> >
> > > Dear R-help,
> > >
> > > I am trying to reproduce some results presented in a paper by Anderson
> > > and Blundell in 1982 in Econometrica using R.
> > > The estimation I want to reproduce concerns maximum likelihood
> > > estimation of a singular equation system.
> > > I can estimate the static model successfully in Stata but for the
> > > dynamic models I have difficulty getting convergence.
> > > My R program which uses the same likelihood function as in Stata has
> > > convergence properties even for the static case.
> > >
> > > I have copied my R program and the data below. I realise the code
> > > could be made more elegant - but it is short enough.
> > >
> > > Any ideas would be highly appreciated.
> >
> > Better starting values would help. In this case, almost too good values are available:
> >
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> >
> > which appears to be the _exact_ solution.
> >
> > Apart from that, it seems that the conjugate gradient methods have difficulties with this likelihood, for some less than obvious reason. Increasing the maxit gets you closer but still not satisfactory.
> >
> > I would suggest trying out the experimental optimx package. Apparently, some of the algorithms in there are much better at handling this likelihood, notably "nlm" and "nlminb".
> >
> >
> >
> >
> > >
> > > ## model 18
> > > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > > n <- length(y1)
> > > beta <- theta[1:8]
> > > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > > e <- cbind(e1, e2)
> > > sigma <- t(e)%*%e
> > > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
> > > return(-logl)
> > > }
> > > p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2,
> > > x1=x1, x2=x2, x3=x3)
> > >
> > > "year","y1","y2","x1","x2","x3"
> > > 1929,0.554779,0.266051,9.87415,8.60371,3.75673
> > > 1930,0.516336,0.297473,9.68621,8.50492,3.80692
> > > 1931,0.508201,0.324199,9.4701,8.27596,3.80437
> > > 1932,0.500482,0.33958,9.24692,7.99221,3.76251
> > > 1933,0.501695,0.276974,9.35356,7.98968,3.69071
> > > 1934,0.591426,0.287008,9.42084,8.0362,3.63564
> > > 1935,0.565047,0.244096,9.53972,8.15803,3.59285
> > > 1936,0.605954,0.239187,9.6914,8.32009,3.56678
> > > 1937,0.620161,0.218232,9.76817,8.42001,3.57381
> > > 1938,0.592091,0.243161,9.51295,8.19771,3.6024
> > > 1939,0.613115,0.217042,9.68047,8.30987,3.58147
> > > 1940,0.632455,0.215269,9.78417,8.49624,3.57744
> > > 1941,0.663139,0.184409,10.0606,8.69868,3.6095
> > > 1942,0.698179,0.164348,10.2892,8.84523,3.66664
> > > 1943,0.70459,0.146865,10.4731,8.93024,3.65388
> > > 1944,0.694067,0.161722,10.4465,8.96044,3.62434
> > > 1945,0.674668,0.197231,10.279,8.82522,3.61489
> > > 1946,0.635916,0.204232,10.1536,8.77547,3.67562
> > > 1947,0.642855,0.187224,10.2053,8.77481,3.82632
> > > 1948,0.641063,0.186566,10.2227,8.83821,3.96038
> > > 1949,0.646317,0.203646,10.1127,8.82364,4.0447
> > > 1950,0.645476,0.187497,10.2067,8.84161,4.08128
> > > 1951,0.63803,0.197361,10.2773,8.9401,4.10951
> > > 1952,0.634626,0.209992,10.283,9.01603,4.1693
> > > 1953,0.631144,0.219287,10.3217,9.06317,4.21727
> > > 1954,0.593088,0.235335,10.2101,9.05664,4.2567
> > > 1955,0.60736,0.227035,10.272,9.07566,4.29193
> > > 1956,0.607204,0.246631,10.2743,9.12407,4.32252
> > > 1957,0.586994,0.256784,10.2396,9.1588,4.37792
> > > 1958,0.548281,0.271022,10.1248,9.14025,4.42641
> > > 1959,0.553401,0.261815,10.2012,9.1598,4.4346
> > > 1960,0.552105,0.275137,10.1846,9.19297,4.43173
> > > 1961,0.544133,0.280783,10.1479,9.19533,4.44407
> > > 1962,0.55382,0.281286,10.197,9.21544,4.45074
> > > 1963,0.549951,0.28303,10.2036,9.22841,4.46403
> > > 1964,0.547204,0.291287,10.2271,9.23954,4.48447
> > > 1965,0.55511,0.281313,10.2882,9.26531,4.52057
> > > 1966,0.558182,0.280151,10.353,9.31675,4.58156
> > > 1967,0.545735,0.294385,10.3351,9.35382,4.65983
> > > 1968,0.538964,0.294593,10.3525,9.38361,4.71804
> > > 1969,0.542764,0.299927,10.3676,9.40725,4.76329
> > > 1970,0.534595,0.315319,10.2968,9.39139,4.81136
> > > 1971,0.545591,0.315828,10.2592,9.34121,4.84082
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Peter Dalgaard
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list