[R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

Sat May 7 21:00:57 CEST 2011

On May 7, 2011, at 17:51 , Ravi Varadhan wrote:

> There is something strange in this problem.  I think the log-likelihood is incorrect.  See the results below from "optimx".  You can get much larger log-likelihood values than for the exact solution that Peter provided.
> 
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
>  n <- length(y1)
>  beta <- theta[1:8]
>  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
>  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
>  e <- cbind(e1, e2)
>  sigma <- t(e)%*%e
>  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))  # it looks like there is something wrong here
>  return(-logl)
> }
> 
> data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
> 
> attach(data)
> 
> require(optimx)
> 
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> 
> # the warnings can be safely ignored in the "optimx" calls
> p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> 
> p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> 
> p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> 

Hm? I get stuff like below (for p3). Some of the entries have a considerably larger NEGATIVE log likelihood, but that's hardly a problem with the likelihood per se.  

         fvalues      method   fns  grs itns conv  KKT1  KKT2 xtimes
12 8.988466e+307      newuoa    NA   NA   NA 9999    NA    NA  0.002
11 8.988466e+307      bobyqa    NA   NA   NA 9999    NA    NA  0.001
3       23.66768 Nelder-Mead  1501   NA NULL    1 FALSE FALSE   0.18
7      -51.76068         spg  1925   NA 1501    1 FALSE FALSE  2.322
4      -55.78708    L-BFGS-B  2093 2093 NULL    0 FALSE FALSE  4.176
2      -70.57023          CG  5360 1501 NULL    1 FALSE  TRUE  3.465
1      -70.66286        BFGS 21481 1500 NULL    1 FALSE  TRUE  5.383
8      -76.73765      ucminf  1500 1500 NULL    0 FALSE  TRUE  0.067
9      -76.73871      Rcgmin  2434  867 NULL    0 FALSE  TRUE  1.514
10     -76.73877      Rvmmin   231   45 NULL    0 FALSE  TRUE  0.101
6      -76.73878      nlminb   130  581   67    0  TRUE  TRUE  0.085
5      -76.73878         nlm    NA   NA   46    0  TRUE  TRUE  0.058

I must admit that I didn't check the likelihood in detail, but minimizing the determinant of the residual SSD matrix is generally what you end up doing in this sort of models. (Of course, you can safely lose the constants, and you can also think up more stable ways of computing the log-determinant, but it's only 2x2 for cryin' out loud.)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com