[R] Problem with constrOptim when hitting boundary

Thu Apr 26 20:02:05 CEST 2012

Hello,

I am using constrOptim to maximize a likelihood function (the values of my parameter vector must be between zero and one and must sum up to <=1). I am getting the error 'initial value in 'vmmin' is not finite'. I've tracked it down to a problem in the function 'R' defined within the constrOptim function. It is performing a check on my values, and its not passing the check, and therefore returning NaN. This is not my initial value, but rather during an internal iteration.

This is the function defined in constrOptim (the following is code from debugging it)

debugging in: R(theta, theta.old)
debug: {
    ui.theta <- ui %*% theta
    gi <- ui.theta - ci
    if (any(gi < 0)) 
        return(NaN)
    gi.old <- ui %*% theta.old - ci
    bar <- sum(gi.old * log(gi) - ui.theta)
    if (!is.finite(bar)) 
        bar <- -Inf
    f(theta, ...) - mu * bar
}

I see that I am getting the gi values that are just machine error below 0 (-8.881784e-16):

Browse[3]> ui.theta
     [,1]
[1,]   -1
[2,]    1
Browse[3]> ci
[1] -1  0
Browse[3]> gi
              [,1]
[1,] -8.881784e-16
[2,]  1.000000e+00

I'm a bit surprised that constrOptim wouldn't be robust to that. I'm wondering what I can do to get around this, if anything. Is this a sign that my function that I defined is not behaving correctly in some way (e.g. should I be handling machine error in the function I defined?). I would note in my example below, I am only optimizing over a one-dimensional parameter, for which I understand optim is not optimal, but my function is general and in other situations it could be a vector of length greater than 1. 

I give an example below, as well as my function and session info. The example is pathological (x, the successes, are equal to m the trials) but I'm running simulations, and this is being created in my simulations. When I run a large set of simulations (over many different parameters) I am actually getting about half the time an error like this, and so I don't know if it is always a pathological example like this. This is just an example I found where I know it failed.

Thanks for your help,
Elizabeth Purdom

Example data:
> x <- c(12, 17,  9, 15, 15,  9, 15, 11, 12,  7) 
> m <- c(12,17,  9, 15, 15,  9, 15, 11, 12,  7)
> AMat<-matrix(c(1,0,0,1),byrow=T,nrow=2,ncol=2)
> eventTiming(x, m, history = AMat)

Error message:
Error in optim(theta.old, fun, gradient, control = control, method = method,  : 
  initial value in 'vmmin' is not finite

Enter a frame number, or 0 to exit   

1: eventTiming(x, m, history = AMat)
2: eventTiming.R#84: constrOptim(theta = init, f = neglik, grad = neggr, ui = 
3: eventTiming.R#84: optim(theta.old, fun, gradient, control = control, method

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.15.0

My function:
eventTiming<-function(x,m,history, exactAllele=FALSE,normCont=0,verbose=FALSE,coverageCutoff=1,minMutations=10){

	A<-history
	if(is.null(dim(A))) stop("'history' should be a matrix of size nCopies x (nEvents +1)")
	nCopies<-nrow(A)
	K<-ncol(A)-1
	possAlleles<-(1:nCopies)/nCopies*(1-normCont)
	nMuts<-length(x)
	if(length(m)!=nMuts) stop("x and m must be of same length")
	if(coverageCutoff<1) stop("coverageCutoff must be at least 1")
	if(any(m<coverageCutoff)){
		if(verbose) warning("some mutations have no reads, will be ignored")
		nBelowCutoff<-length(which(m<coverageCutoff))
		nZero<-length(which(m==0))
		x<-x[which(m>=coverageCutoff)]
		m<-m[which(m>=coverageCutoff)]
	}
	else{
		nBelowCutoff<-0
		nZero<-0
	}
	if(length(m)<minMutations){
		if(length(normCont)>0){
			if(length(possAlleles)!=K+1) stop("invalid length for possible Allele Frequency values")
			if(is.null(names(possAlleles))) names(possAlleles)<-paste("AlleleFreq",1:length(possAlleles),sep="")
			return(list(pi=rep(NA,length=K+1),alleleSet=possAlleles,optimOut=NA,nBelowCutoff=nBelowCutoff,nZero=nZero,nMut=length(m)))
		}
		else return(list(pi=rep(NA,length=K+1),alleleSet=NA,optimOut=NA,nBelowCutoff=nBelowCutoff,nZero=nZero,nMut=length(m)))
	}

	###Make 'weight matrix' of P(X=X[i]|X>0, P=p(k,s)) of size nStages x nCopies for each x -- 
	###an array that is nMuts x nCopies of the prob of X given a certain allele
	probX<-function(p){
		dbinom(x=x,size=m,prob=p)
	}
	wtMat<-sapply(possAlleles,probX)  #nMuts x nAlleles
	if(is.null(dim(wtMat))){
		if(length(x)==1) wtMat<-matrix(wtMat,nrow=1)
		else stop("Programming error -- single row wtMat")
	}
	###Need to convert so in terms of pi[-0]
	transMat<-rbind(rep(-1,K),diag(K))
	Aid<-A%*%transMat
	numGr<-wtMat %*% Aid #N x K matrix
	init<-rep(1/(K+1),length=K) #equal distribution; could perhaps choose better init value
	getProbFromTheta<-function(theta){
		return(c(1-sum(theta),theta))
	}
	neglik<-function(theta){
		probDist<-as.vector(A%*%getProbFromTheta(theta)) #should be a vector of probabilities
		if(length(probDist)!=K+1) stop("Programming error -- wrong length")
		if(any(probDist< -1e-10)) stop("Programming error -- negative probabilities")
		if(any(probDist> 1+1e-10)) stop("Programming error -- >1 probabilities")
		-sum(log(rowSums(sweep(wtMat,2,probDist,FUN="*"))))
	}
	#numerator of grad  because linear in pi -- same regardless of value of theta
	neggr<-function(theta){
		#denominator:
		probDist<-as.vector(A%*%getProbFromTheta(theta))
		if(length(probDist)!=K+1) stop("Programming error -- wrong length")
		if(any(probDist< -1e-10)) stop("Programming error -- negative probabilities")
		if(any(probDist> 1+1e-10)) stop("Programming error -- >1 probabilities")
		denom<-rowSums(sweep(wtMat,2,probDist,FUN="*")) #should be a vector length equal to N

		-sum(sweep(numGr,1,denom,"/"))

	}

	lengthOfTheta<-K
	uiL1<-diag(rep(-1,lengthOfTheta),nrow=lengthOfTheta,ncol=lengthOfTheta)
	ciL1<-rep(1,lengthOfTheta)
	uiGr0<-diag(rep(1,lengthOfTheta),nrow=lengthOfTheta,ncol=lengthOfTheta)
	ciGr0<-rep(0,lengthOfTheta)
	ui<-rbind(uiL1,uiGr0)
	ci<- c(-ciL1,-ciGr0)
	#The feasible region is defined by ui %*% theta - ci >= 0. 
	out<-constrOptim(theta=init,f = neglik, grad=neggr, ui=ui, ci=ci)

	return(list(pi=getProbFromTheta(out$par),alleleSet=possAlleles,optimOut=out,nBelowCutoff=nBelowCutoff,nZero=nZero,nMut=length(m)))
}