\documentclass[english]{article} \begin{document} %\VignetteIndexEntry{BB Tutorial} %\VignetteDepends{numDeriv, setRNG} %\VignetteKeywords{accelerate failure time model, Barzilai-Borwein, derivative-free, estimating equations, large-scale optimization, non-monotone line search, non-smooth optimization, rank-based regression} %\VignettePackage{BB} \SweaveOpts{eval=TRUE,echo=TRUE,results=verbatim,fig=FALSE,keep.source=TRUE} \begin{Scode}{echo=FALSE,results=hide} options(continue=" ") \end{Scode} \section{Overview of BB} ``BB'' is a package intended for two purposes: (1) for solving a nonlinear system of equations, and (2) for finding a local optimum (can be minimum or maximum) of a scalar, objective function. An attractive feature of the package is that it has minimum memory requirements. Therefore, it is particularly well suited to solving high-dimensional problems with tens of thousands of parameters. However, \emph{BB} can also be used to solve a single nonlinear equation or optimize a function with just one variable. The functions in this package are made available with: \begin{Scode} library("BB") \end{Scode} You can look at the basic information on the package, including all the available functions wtih \begin{Scode}{eval=FALSE,results=hide} help(package=BB) \end{Scode} The three basic functions are: \emph{spg}, \emph{dfsane}, and \emph{sane}. You should \emph{spg} for optimization, and either \emph{dfsane} or \emph{sane} for solving a nonlinear system of equations. We prefer \emph{dfsane}, since it tends to perform slightly better than \emph{sane}. There are also 3 higher level functions: \emph{BBoptim}, \emph{BBsolve}, and \emph{multiStart}. \emph{BBoptim} is a wrapper for \emph{spg} in the sense that it calls \emph{spg} repeatedly with different algorithmic options. It can be used when \emph{spg} fails to find a local optimum, or it can be used in place of \emph{spg}. Similarly, \emph{BBsolve} is a wrapper for \emph{dfsane} in the sense that it calls \emph{dfsane} repeatedly with different algorithmic options. It can be used when \emph{dfsane} (\emph{sane}) fails to find a local optimum, or it can be used in place of \emph{dfsane} (\emph{sane}). The \emph{multiStart} function can accept multiple starting values. It can be used for either solving a nonlinear system or for optimizing. It is useful for exploring sensitivity to starting values, and also for finding multiple solutions. The package \emph{setRNG} is not necessary, but if you want to exactly reproduce the examples in this guide then do this: \begin{Scode} require("setRNG") setRNG(list(kind="Wichmann-Hill", normal.kind="Box-Muller", seed=1236)) \end{Scode} after which the example need to be run in the order here (or at least the parts that generate random numbers). For some examples the RNG is reset again so they can be reproduced more easily. \section{How to solve a nonlinear system of equations with BB?} The first two examples are from La Cruz and Raydan, Optim Methods and Software 2003, 18 (583-599). \begin{Scode} expo3 <- function(p) { # From La Cruz and Raydan, Optim Methods and Software 2003, 18 (583-599) n <- length(p) f <- rep(NA, n) onm1 <- 1:(n-1) f[onm1] <- onm1/10 * (1 - p[onm1]^2 - exp(-p[onm1]^2)) f[n] <- n/10 * (1 - exp(-p[n]^2)) f } p0 <- runif(10) ans <- dfsane(par=p0, fn=expo3) ans \end{Scode} Let us look at the output from \emph{dfsane}. It is a list with 7 components. The most important components to focus on are the two named ``\emph{par}'' and ``\emph{convergence}''. \emph{ans\$par} provides the solution from \emph{dfsane}, but this is a root if and only if \emph{ans\$convergence} is equal to \emph{0}, i.e. \emph{ans\$message} should say ``Successful convergence''. Otherwise, the algorithm has failed. Now, we show an example demonstrating the ability of BB to solve a large system of equations, N = 10000. \begin{Scode} trigexp <- function(x) { n <- length(x) F <- rep(NA, n) F[1] <- 3*x[1]^2 + 2*x[2] - 5 + sin(x[1] - x[2]) * sin(x[1] + x[2]) tn1 <- 2:(n-1) F[tn1] <- -x[tn1-1] * exp(x[tn1-1] - x[tn1]) + x[tn1] * ( 4 + 3*x[tn1]^2) + 2 * x[tn1 + 1] + sin(x[tn1] - x[tn1 + 1]) * sin(x[tn1] + x[tn1 + 1]) - 8 F[n] <- -x[n-1] * exp(x[n-1] - x[n]) + 4*x[n] - 3 F } n <- 10000 p0 <- runif(n) ans <- dfsane(par=p0, fn=trigexp, control=list(trace=FALSE)) ans$message ans$resid \end{Scode} The next example is from Freudenstein and Roth function (Broyden, Mathematics of Computation 1965, p. 577-593). \begin{Scode} froth <- function(p){ f <- rep(NA,length(p)) f[1] <- -13 + p[1] + (p[2]*(5 - p[2]) - 2) * p[2] f[2] <- -29 + p[1] + (p[2]*(1 + p[2]) - 14) * p[2] f } \end{Scode} Now, we introduce the function \emph{BBsolve}. For the first starting value, \emph{dfsane} used in the default manner does not find the zero, but \emph{BBsolve}, which tries multiple control parameter settings, is able to successfully find the zero. \begin{Scode} p0 <- c(3,2) dfsane(par=p0, fn=froth, control=list(trace=FALSE)) BBsolve(par=p0, fn=froth) \end{Scode} Note that the functions \emph{dfsane}, \emph{sane}, and \emph{spg} produce a warning message if convergence fails. These warnings have been suppressed in this vignette. For the next starting value, \emph{BBsolve} finds the zero of the system, but \emph{dfsane} (with defaults) fails. \begin{Scode} p0 <- c(1,1) BBsolve(par=p0, fn=froth) dfsane(par=p0, fn=froth, control=list(trace=FALSE)) \end{Scode} Try random starting values. Run the following set of code many times. This shows that \emph{BBsolve} is quite robust in finding the zero, whereas \emph{dfsane} (with defaults) is sensitive to starting values. Admittedly, these are poor starting values, but still it would be nice to have a strategy that has a high likelihood of finding a zero of the nonlinear system. \begin{Scode} # two values generated independently from a poisson distribution with mean = 10 p0 <- rpois(2,10) BBsolve(par=p0, fn=froth) dfsane(par=p0, fn=froth, control=list(trace=FALSE)) \end{Scode} \subsection{Finding multiple roots of a nonlinear system of equations} Now, we introduce the function \emph{multiStart}. This accepts a matrix of starting values, where each row is a single starting value. \emph{multiStart} calls \emph{BBsolve} for each starting value. Here is a system of 3 non-linear equations, where each equation is a high-degree polynomial. This system has 12 real-valued roots and 126 complex-valued roots. Here we will demonstrate how to identify all the 12 real roots using \emph{multiStart}. Note that we specify the `\emph{action}' argument in the following call to \emph{multiStart} only to highlight that \emph{multiStart} can be used for both solving a system of equations and for optimization. The default is `\emph{action = "solve"}', so it is really not needed in this call. \begin{Scode} # Example # A high-degree polynomial system (R.B. Kearfott, ACM 1987) # There are 12 real roots (and 126 complex roots to this system!) # hdp <- function(x) { f <- rep(NA, length(x)) f[1] <- 5 * x[1]^9 - 6 * x[1]^5 * x[2]^2 + x[1] * x[2]^4 + 2 * x[1] * x[3] f[2] <- -2 * x[1]^6 * x[2] + 2 * x[1]^2 * x[2]^3 + 2 * x[2] * x[3] f[3] <- x[1]^2 + x[2]^2 - 0.265625 f } \end{Scode} We generate 100 randomly generated starting values, each a vector of length equal to 3. (Setting the seed is only necessary to reproduce the result shown here.) \begin{Scode}{results=hide} setRNG(list(kind="Wichmann-Hill", normal.kind="Box-Muller", seed=123)) p0 <- matrix(runif(300), 100, 3) # 100 starting values, each of length 3 ans <- multiStart(par=p0, fn=hdp, action="solve") sum(ans$conv) # number of successful runs = 99 pmat <- ans$par[ans$conv, ] # selecting only converged solutions \end{Scode} Now, we display the unique real solutions. \begin{Scode} ans <- round(pmat, 4) ans[!duplicated(ans), ] \end{Scode} We can also visualize these 12 solutions beautifully using a `biplot' based on the first 2 principal components of the converged parameter matrix. \begin{Scode}{fig=TRUE} pc <- princomp(pmat) biplot(pc) # you can see all 12 solutions beautifully like on a clock! \end{Scode} \subsection{Power polynomial method: Fleishman system of equations} Fleishman (Psychometrika 1978, p.521-532) developed an approach for simulating random numbers from non-normal distrinbutions, with specified values of skewness and kurtosis. This approach involves the solution of a system of polynomial equations. This system is also discussed in the paper by Demirtas and Hedeker (Communications in Statistics 2008, p. 1682-1695; Equations on p. 1684) and is given as follows: \begin{Scode} fleishman <- function(x, r1, r2) { b <- x[1] c <- x[2] d <- x[3] f <- rep(NA, 3) f[1] <- b^2 + 6 * b * d + 2 * c^2 + 15 * d^2 - 1 f[2] <- 2*c * (b^2 + 24*b*d + 105*d^2 + 2) - r1 f[3] <- b*d + c^2 * (1 + b^2 + 28 * b * d) + d^2 * (12 + 48 * b* d + 141 * c^2 + 225 * d^2) - r2/24 f } \end{Scode} We only use 3 equations, since 1st equation is trivially solved by \emph{a = -c}. Here we describe an experiment based on Fleishman (Psychometrika 1978, p.521-532), and is reproduced as follows. We randomly picked 10 scenarios (more or less randomly) from Table 1 of Fleishman (1978): \begin{Scode} rmat <- matrix(NA, 10, 2) rmat[1,] <- c(1.75, 3.75) rmat[2,] <- c(1.25, 2.00) rmat[3,] <- c(1.00, 1.75) rmat[4,] <- c(1.00, 0.50) rmat[5,] <- c(0.75, 0.25) rmat[6,] <- c(0.50, 3.00) rmat[7,] <- c(0.50, -0.50) rmat[8,] <- c(0.25, -1.00) rmat[9,] <- c(0.0, -0.75) rmat[10,] <- c(-0.25, 3.75) \end{Scode} We solve the system of equations for the above 10 specifications of skewness and kurtosis 3 times, each time with a different random starting seed. \begin{Scode} # 1 setRNG(list(kind="Mersenne-Twister", normal.kind="Inversion", seed=13579)) ans1 <- matrix(NA, nrow(rmat), 3) for (i in 1:nrow(rmat)) { x0 <- rnorm(3) # random starting value temp <- BBsolve(par=x0, fn=fleishman, r1=rmat[i,1], r2=rmat[i,2]) if (temp$conv == 0) ans1[i, ] <- temp$par } ans1 <- cbind(rmat, ans1) colnames(ans1) <- c("skew", "kurtosis", "B", "C", "D") ans1 # 2 setRNG(list(kind="Mersenne-Twister", normal.kind="Inversion", seed=91357)) ans2 <- matrix(NA, nrow(rmat), 3) for (i in 1:nrow(rmat)) { x0 <- rnorm(3) # random starting value temp <- BBsolve(par=x0, fn=fleishman, r1=rmat[i,1], r2=rmat[i,2]) if (temp$conv == 0) ans2[i, ] <- temp$par } ans2 <- cbind(rmat, ans2) colnames(ans2) <- c("skew", "kurtosis", "B", "C", "D") ans2 # 3 setRNG(list(kind="Mersenne-Twister", normal.kind="Inversion", seed=79135)) ans3 <- matrix(NA, nrow(rmat), 3) for (i in 1:nrow(rmat)) { x0 <- rnorm(3) # random starting value temp <- BBsolve(par=x0, fn=fleishman, r1=rmat[i,1], r2=rmat[i,2]) if (temp$conv == 0) ans3[i, ] <- temp$par } ans3 <- cbind(rmat, ans3) colnames(ans3) <- c("skew", "kurtosis", "B", "C", "D") ans3 \end{Scode} This usually finds an accurate root of the Fleishman system successfully in all 50 cases (but may occassionaly fail with different seeds). An interesting aspect of this exercise is the existence of multiple roots to the Fleishman system. There are 4 valid roots for any ''feasible'' combination of skewness and kurtosis. These 4 roots can be denoted as: $(b_1, c_1, -d_1)$, $(-b_1, c_1, d_1)$, $(b_2, c_2, -d_2)$, $(-b_2, c_2, d_2)$, where $b_1, c_1, d_1, b_2, c_2, d_2$ are all positive (except for the coefficient \emph{c} which is zero when skewness is zero). Fleishman only reports the first root, whereas we can locate the other roots using BBsolve. The experiments demonstrate quite convincingly that the wrapper function \emph{BBsolve} can successfully solve the system of equations associated with the power polynomial method of Fleishman. \section{How to optimize a nonlinear objective function with BB?} The basic function for optimization is \emph{spg}. It can solve smooth, nonlinear optimization problems with box-constraints, and also other types of constraints using projection. We would like to direct the user to the help page for many examples of how to use \emph{spg}. Here we discuss an example involving estimation of parameters maximizing a log-likelihood function for a binary Poisson mixture distribution. \begin{Scode} poissmix.loglik <- function(p,y) { # Log-likelihood for a binary Poisson mixture distribution i <- 0:(length(y)-1) loglik <- y * log(p[1] * exp(-p[2]) * p[2]^i / exp(lgamma(i+1)) + (1 - p[1]) * exp(-p[3]) * p[3]^i / exp(lgamma(i+1))) return (sum(loglik) ) } # Data from Hasselblad (JASA 1969) poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1)) \end{Scode} There are 3 model parameters, which have restricted domains. So, we define these constraints as follows: \begin{Scode} lo <- c(0,0,0) # lower limits for parameters hi <- c(1, Inf, Inf) # upper limits for parameters \end{Scode} Now, we maximize the log-likelihood function using both \emph{spg} and \emph{BBoptim}, with a randomly generated starting value for the 3 parameters: \begin{Scode} p0 <- runif(3,c(0.2,1,1),c(0.8,5,8)) # a randomly generated vector of length 3 y <- c(162,267,271,185,111,61,27,8,3,1) ans1 <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi, control=list(maximize=TRUE, trace=FALSE)) ans1 ans2 <- BBoptim(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi, control=list(maximize=TRUE)) ans2 \end{Scode} Note that we had to specify the `\emph{maximize}' option inside the \emph{control} list to let the algorithm know that we are maximizing the objective function, since the default is to minimize the objective function. Also note how we pass the data vector `\emph{y}' to the log-likelihood function, \emph{possmix.loglik}. Now, we illustrate how to compute the Hessian of the log-likelihood at the MLE, and then how to use the Hessian to compute the standard errors for the parameters. To compute the Hessian we require the package "\emph{numDeriv}." \begin{Scode} require(numDeriv) hess <- hessian(x=ans2$par, func=poissmix.loglik, y=y) # Note that we have to supplied data vector `y' hess se <- sqrt(diag(solve(-hess))) se \end{Scode} Now, we explore the use of multiple starting values to see if we can identify multiple local maxima. We have to make sure that we specify `\emph{action = "optimize"}', because the default option in \emph{multiStart} is \emph{"solve"}. \begin{Scode} # 3 randomly generated starting values p0 <- matrix(runif(30, c(0.2,1,1), c(0.8,8,8)), 10, 3, byrow=TRUE) ans <- multiStart(par=p0, fn=poissmix.loglik, action="optimize", y=y, lower=lo, upper=hi, control=list(maximize=TRUE)) # selecting only converged solutions pmat <- round(cbind(ans$fvalue[ans$conv], ans$par[ans$conv, ]), 4) dimnames(pmat) <- list(NULL, c("fvalue","parameter 1","parameter 2","parameter 3")) pmat[!duplicated(pmat), ] \end{Scode} Here \emph{multiStart} is able to identifies many solutions. Two of these, the 2nd and 3rd rows, appear to be global maxima with different parameter values. Actually, there is only one global maximum. It is due to the `label switching' problem that we see 2 solutions. The multiStart algorithm also identifies three local maxima with inferior values. \end{document}