[R] Help with faster optimization for large parameter problem
Ravi Varadhan
rvaradhan at jhmi.edu
Fri Mar 2 15:50:40 CET 2007
Hi James,
There are a few things that immediately come to my mind that you could try:
1. Profile your code using Rprof to detect which steps are the most
time-consuming.
2. In your likelihood function you can make the code a bit more faster by
using "outer" instead of %*%.
3. Using conjugate-gradient type methods in "optim" might be a better
option since you are dealing with thousands of parameters and "BFGS" could
use up a lot of memory working with a large Hessian matrix. CG methods use
much less storage and memory than quasi-Newton methods. The main caveat
here is that the CG methods generally have slower convergence than QN type
methods, unless you can precondition the problem.
Hope this is helpful,
Ravi.
----------------------------------------------------------------------------
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
----------------------------------------------------------------------------
--------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of James Fowler
Sent: Friday, March 02, 2007 4:34 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Help with faster optimization for large parameter problem
Hello all,
I have a large parameter problem with the following very simple likelihood
function:
fn<-function(param) {
x1<-param[1:n]
g1<-param[(n+1):(2*n)]
beta<-param[(2*n+1):(2*n+k)]
sigma2<-param[2*n+k+1]^2
meang1sp<-mean(g1[sp])
mu<-beta%*%matrix(x1,1,n)-(g1[sp]-meang1sp)%*%matrix(g1,1,n)
return(sum((ydc-mu)^2)/(2*sigma2) + n*k*log(sqrt(sigma2)) +
mean(x1)^2 + mean(g1)^2 + 1000*(x1[1]>x1[n]))
}
There's nothing special here -- it's just plain old OLS, except all the
variables on the right hand side are parameters (only the variable ydc is
observed). I have no problems getting this to recover parameter estimates
from data I myself have generated for smaller problems (e.g. where ydc is
a k=500 by n=50 matrix and there are 601 parameters). But the target
problem with real data will be k=6000 by n=400 with 6801 parameters.
I am currently using optim(method="BFGS") but it is slow. I can get good
starting values for x1 and g1 with some svd techniques, and these help me
generate the starting values for the betas via lm().
I then use optim() on a modified likelihood function to find g1,x1,sigma2
while holding beta fixed and then use optim() again to find beta while
holding the other variables fixed. But eventually, I have to run optim on
the unmodified likelihood function above and it is very slow, taking
several days for large problems.
I have also tried metrop() in mcmc, but I find this needs to be very
close to the mode of the likelihood to be efficient (in fact, MCMCpack's
metropolis function calls optim first and even requires it to invert the
hessian before even starting the metropolis algorithm, unless we can
provide our own covariance matrix). I will probably use metrop() to
generate standard errors once I find a mode....
In the mean time, I can't help thinking that there is some easy way to
make this much faster than I am currently doing, especially since the
likelihood is normal. I am sure I have missed something obvious so I'd
very much appreciate any advice you could give on packages in R or code
that might help.
Thanks!
james
----------------------------------------------------------------------
James H. Fowler, Associate Professor web: http://jhfowler.ucsd.edu
Department of Political Science email: jhfowler at ucsd.edu
University of California, San Diego phone: (858) 534-6807
Social Sciences Building 383, #0521 fax: (858) 534-7130
9500 Gilman Drive, La Jolla, CA 92093
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list