[R] using near-zero probabilities in optimization

Benedikt Gehr, Institut für Evolutionsbiologie und Umweltwissenschaften benedikt.gehr at ieu.uzh.ch
Wed Mar 10 10:18:55 CET 2010


Dear Ben

I haven't looked at the profiles yet, I'm not sure how to.
I'm doing population reconstruction from age-at-harvest data. I'm using a 
multinomial distribution to model the observed harvest numbers.
But I didn't know about the Dirichlet-Multinomial so far. I think that might 
really be a good option because overdispersion is likeliy to be an issue 
here.
Below I have copied my code for the simulation of data (the first example 
works, the other gives poor fit) and subsequent optimization. I'm sorry the 
code is very long, but I didn't know how to short it down.

The problem:
Q=harvest rate -> if harvest rates are too small the model fits the data 
very bad.

Here is the R code:

library(boot)
library(bbmle)
library(demogR)
library(msm)

##########################################
##########################################
##Stochastic model -> DATA SIMULATION
##########################################

a<-8										# nr of age classes
y<-30										# nr of years

m<-c(rep(0.2,2),rep(0.75,4),0.5,0.2)					# productivity -> nr of 
young/female
m.time<-matrix(rtnorm(a*y,m,0.1,lower=0),ncol=a,byrow=T)


############################################################
##TWO VERSIONS OF Q=HARVEST RATE->Q1 WORKS, Q2 GIVES BAD FIT
############################################################

Q<-c(rep(0.2,2),rep(0.3,3),0.4,0.6,0.9)	#WORKS		# true mean harvest 
mortality rates
Q<-c(rep(0.02,2),rep(0.03,3),0.4,0.6,0.9)	#BAD FIT		# true mean harvest 
mortality rates

Q.time<-matrix(rtnorm(a*y,Q,0.06,lower=0,upper=1),ncol=a,byrow=T)	# adding 
random noise
#############################################################
#############################################################


P<-c(0.5,rep(0.9,4),rep(0.8,2),0.7)					# natural survival rates
P.time<-matrix(rtnorm(a*y,P,0.06,lower=0,upper=1),ncol=a,byrow=T)	# adding 
random noise


L.time<-(1-Q.time[,1:(a-1)])*P.time[,2:a]				# Overall survival  -> Derived 
parameter

F.time<-m.time*P.time[,1]						# Fecundity 	  -> Derived parameter

Lm.time<-array(0,c(a,a,y))						# Lesli-matrix-array -> for each year a new 
matrix
for (i in 1:y){
Lm.time[,,i]<-odiag(L.time[i,],-1)					# survival in the off-diagonal
}

for (i in 1:y){								# fecundity in the first row
Lm.time[1,,i]<-F.time[i,]
}

Hm.time<-array(0,c(a,a,y))						# Harvest-matrix-array -> for each year a 
new matrix
for (i in 1:y){
Hm.time[,,i]<-odiag(Q.time[i,])					# mortality rates in the diagonal
}

N1<-rep(200,a)								# Define the starting vector of cohort sizes
N.true<-matrix(rep(0,a*y),ncol=a)					# True abundance matrix
N.true[1,]<-N1
C<-matrix(rep(0,a*y),ncol=a)						# Harvest matrix

for (i in 1:(y-1)){
N.true[i+1,]<-Lm.time[,,i]%*%N.true[i,]
}
N.true<-round(N.true)							# round the counts of the true population

for (i in 1:y){
C[i,]<-Hm.time[,,i]%*%N.true[i,]
}
C<-round(C)									#round the counts of the harvest numbers

plot(rowSums(N.true)~seq(1:y),ylim=c(0,4000),col="red")	# Plot the time 
series of the simulated data
lines(rowSums(N.true)~seq(1:y),col="red")				# true population and true 
harvest numbers
points(rowSums(C)~seq(1:y),ylim=c(0,4000),col="blue")
lines(rowSums(C)~seq(1:y),col="blue")


###########################################################################
###########################################################################

############################
##ANALYSIS OF SIMULATED DATA
############################

##Simulated harvest numbers of a cohort harvested over 30 years (including 
random noise)
pop<-N.true							# True population size
cohort<-C							# harvest matrix
R<-nrow(cohort)
L<-ncol(cohort)
	
h<-matrix(rep(Q,R),ncol=a,byrow=T) 	# estimated harvest rates
S<-matrix(rep(P[-1],R),ncol=a-1,byrow=T) 	# survival rates

##############################################
##Function to calculate the cell probabilities
##############################################

predprob <- function(S=S,h=h) {
   R<-nrow(cohort)
   L<-ncol(cohort)
   p <- matrix(rep(0,(R)*(L)),ncol=L)		# matrix of cell probabilities
   	p[R,1]<-h[R,1]
	p[1,L]<-h[1,L]
   ay<-rep(0,L+R+1)					# vector of last cell probabilities -> 
1-sum(rest));Gove(2002)
   	ay[L+R]<-1-h[1,L]
	ay[2]<-1-h[R,1]
   index<-3							# ay[1]=Null, ay[2]=1-h[R,1],ay[L+R]=1-h[1,L], 
ay[L+R+1]=Null


for (i in -(R-2):(L-2)){				# Calculating the cell prob after Gove (2002)

	p[row(p) == col(p) - i] <- 
	odiag(h,i)*c(1,cumprod((1-odiag(h[1:(R-1),1:(L-1)],i))
						*odiag(S[1:(R-1),],i)))
	
	ay[index]<-1-sum(odiag(p[1:R,],i))
	
	index<-index+1
    }

	p<-rbind(p,ay[1:L])
	p<-cbind(p,ay[length(ay):(L+1)])
return(p)

  }

predprob(S,h)						#test the function

####################################
##Function to calculate the multinomial ->from Ben Bolker
####################################

## multinom WITHOUT ROUNDING & without error-checking
dmnom <- function(x,prob,log=FALSE) {
   r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
   if (log) r else exp(r)
}

###########################
## function to be optimized
###########################

mfun <- function(logN,S=S,h=h,cohort=cohort) {
   L<-ncol(cohort)
   R<-nrow(cohort)
   d<-matrix(rep(NA,(R+1)*(L+1)),ncol=L+1)		# last row and last col of the 
matrix
   d[1:R,1:L]<-log(cohort)				# are the values to be optimized->Log 
tranformed data!

   index<-1
   index1<-(R+L)

for (i in -(R-1):(L-1)){				# values for the top row
if (i<= -(R-L) ){
d[(R+1),(index+1)]<-log(exp(logN[index])-sum(odiag(cohort,i)))} else
if (i> -(R-L)& i<1 ){
d[index1,(L+1)]<-log(exp(logN[index])-sum(odiag(cohort,i)))} else
if (i>0){
d[index1,(L+1)]<-log(exp(logN[index])-sum(odiag(cohort,i)))}

index<-index+1
index1<-index1-1
}
   
pp<-predprob(S,h)

lhood<-0							# The Likelihood function -> to be maximized

for (i in -(R-1):(L-1)){
  lhood<-lhood+(-dmnom(exp(odiag(d,i)),prob=odiag(pp,i),log=TRUE))
}
return(lhood)
}

logN<-log(rep(1000,(R+L-1)))  			# Test the function with test-values
mfun(logN,S,h,cohort)

##########################################################################################
##########################################################################################

######################################
##Starting values for the optimization
######################################

N<-rep(500,(R+L-1))					# optimization,->N-x1-x2-x3
svec = log(N) 						# transform to avoid constraints (x>0)
names(svec) <- paste("logN",1:(R+L-1),sep="")
parnames(mfun) <- names(svec)

######################################
##Lower bounds for the L-BFGS-B-method
######################################

index<-1
lower<-rep(0,(R+L-1))

for (i in -(R-1):(L-1)){
lower[index]<-log(sum(odiag(cohort,i)))
index<-index+1
}

lower<-lower+0.001
names(lower)<-names(svec)
exp(lower)							# check lower bounds

#########################
##Optimization with mle2
#########################

## use bounded optimization

m1 = mle2(mfun,start=svec,
   method="L-BFGS-B",lower=lower,upper=10,		# lower=must be larger than 
"sum(odiag(x))"<-if smaller NANs are produced
   vecpar=TRUE,
   data=list(S=S,h=h,cohort=cohort))

coef(m1)

Nest<-exp(coef(m1))					# estimated cohort sizes for row and colum 1

############################################################################################
############################################################################################

#####################
##Analyse the Goodness of fit
#####################

Nya<-matrix(rep(0,length(cohort)),ncol=L)		# project the N estimates into a 
matrix
Nya[1,1:L]<-Nest[R:length(Nest)]
Nya[2:R,1]<-rev(Nest[1:(R-1)])


stot<-(1-Q[1:(a-1)])*P[2:a]				# derived survival -> product of (1-h(i)) amd 
s(i)
smat<-matrix(rep(stot,R),ncol=L-1,byrow=T)

for (i in 1:ncol(smat)){				# project the N estimates trough time
Nya[2:R,i+1]<-Nya[1:(R-1),i]*smat[1:(R-1),i]
}


#######################
##Expected harvest values compared with observed harvest numbers
#######################

exp<-Nya*h
obs<-cohort

gof<-((obs-exp)^2)/exp					# Goodness of fit for the single 
estimates							# Overall GOF statistic
gof.stat<-sum(gof)					# GOF statistic
p.value<-1-pchisq(gof.stat,(length(obs)-1))	# p-value with appropriate 
degrees of freedom
p.value

plot(obs~exp,main="Observed harvest numbers vs expected from the 
model",xlim=c(0,200),ylim=c(0,200))
abline(0,1)

##################
##Plots
##################
##Plot true population trend <-> compare with estimated trend

plot(rowSums(N.true)~seq(1:y),ylim=c(0,5000),col="red",pch=17,main="True vs 
estimated population trend")
lines(rowSums(N.true)~seq(1:y),col="red")

points(rowSums(Nya)~seq(1:y),ylim=c(0,5000),col="blue",pch=16)
lines(rowSums(Nya)~seq(1:y),col="blue")

###############################################################
###############################################################
##########
##END#####
##########

Thanks a lot for the help!!!!

Benedikt



On Tue, 9 Mar 2010 15:34:09 +0000 (UTC)
  Ben Bolker <bolker at ufl.edu> wrote:
> Benedikt Gehr <benedikt.gehr <at> ieu.uzh.ch> writes:
> 
>> 
>> Hi there
>> 
>> I am using mle2 for a multinomial likelihood optimization problem. My 
>> function works fine when I'm using simulated data, however my cell 
>> probabilities of the true data for the multinomial likelihood are 
>> sometimes very small (in some cases <0.00...) and the estimated point 
>> estimates fit the true vlaues quite poorly. Is there a way how to handle 
>> near zero probabilities in maximum likelihood optimization?
>> 
> 
>  Hard to say without more detail.  Can you send a reproducible
> example (your data, or a small subset of your data, or
> some way of simulating the data that *does* create the problem)?
> 
>  Since you're using log-likelihoods already (within mle2) the
> problem is unlikely (?) to be numerical -- R doesn't have any
> problem with very large negative log-likelihoods.  Do your
> likelihood profiles look reasonable?  If so then the problem
> is more likely that your model doesn't fit the data well than
> that you are having convergence problems.
> 
>  Have you considered the possibility of overdispersion
> (non-homogeneity/non-independence), e.g. via a Dirichlet-multinomial
> model?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
>http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list