[R] using near-zero probabilities in optimization
Benedikt Gehr, Institut für Evolutionsbiologie und Umweltwissenschaften
benedikt.gehr at ieu.uzh.ch
Wed Mar 10 10:18:55 CET 2010
Dear Ben
I haven't looked at the profiles yet, I'm not sure how to.
I'm doing population reconstruction from age-at-harvest data. I'm using a
multinomial distribution to model the observed harvest numbers.
But I didn't know about the Dirichlet-Multinomial so far. I think that might
really be a good option because overdispersion is likeliy to be an issue
here.
Below I have copied my code for the simulation of data (the first example
works, the other gives poor fit) and subsequent optimization. I'm sorry the
code is very long, but I didn't know how to short it down.
The problem:
Q=harvest rate -> if harvest rates are too small the model fits the data
very bad.
Here is the R code:
library(boot)
library(bbmle)
library(demogR)
library(msm)
##########################################
##########################################
##Stochastic model -> DATA SIMULATION
##########################################
a<-8 # nr of age classes
y<-30 # nr of years
m<-c(rep(0.2,2),rep(0.75,4),0.5,0.2) # productivity -> nr of
young/female
m.time<-matrix(rtnorm(a*y,m,0.1,lower=0),ncol=a,byrow=T)
############################################################
##TWO VERSIONS OF Q=HARVEST RATE->Q1 WORKS, Q2 GIVES BAD FIT
############################################################
Q<-c(rep(0.2,2),rep(0.3,3),0.4,0.6,0.9) #WORKS # true mean harvest
mortality rates
Q<-c(rep(0.02,2),rep(0.03,3),0.4,0.6,0.9) #BAD FIT # true mean harvest
mortality rates
Q.time<-matrix(rtnorm(a*y,Q,0.06,lower=0,upper=1),ncol=a,byrow=T) # adding
random noise
#############################################################
#############################################################
P<-c(0.5,rep(0.9,4),rep(0.8,2),0.7) # natural survival rates
P.time<-matrix(rtnorm(a*y,P,0.06,lower=0,upper=1),ncol=a,byrow=T) # adding
random noise
L.time<-(1-Q.time[,1:(a-1)])*P.time[,2:a] # Overall survival -> Derived
parameter
F.time<-m.time*P.time[,1] # Fecundity -> Derived parameter
Lm.time<-array(0,c(a,a,y)) # Lesli-matrix-array -> for each year a new
matrix
for (i in 1:y){
Lm.time[,,i]<-odiag(L.time[i,],-1) # survival in the off-diagonal
}
for (i in 1:y){ # fecundity in the first row
Lm.time[1,,i]<-F.time[i,]
}
Hm.time<-array(0,c(a,a,y)) # Harvest-matrix-array -> for each year a
new matrix
for (i in 1:y){
Hm.time[,,i]<-odiag(Q.time[i,]) # mortality rates in the diagonal
}
N1<-rep(200,a) # Define the starting vector of cohort sizes
N.true<-matrix(rep(0,a*y),ncol=a) # True abundance matrix
N.true[1,]<-N1
C<-matrix(rep(0,a*y),ncol=a) # Harvest matrix
for (i in 1:(y-1)){
N.true[i+1,]<-Lm.time[,,i]%*%N.true[i,]
}
N.true<-round(N.true) # round the counts of the true population
for (i in 1:y){
C[i,]<-Hm.time[,,i]%*%N.true[i,]
}
C<-round(C) #round the counts of the harvest numbers
plot(rowSums(N.true)~seq(1:y),ylim=c(0,4000),col="red") # Plot the time
series of the simulated data
lines(rowSums(N.true)~seq(1:y),col="red") # true population and true
harvest numbers
points(rowSums(C)~seq(1:y),ylim=c(0,4000),col="blue")
lines(rowSums(C)~seq(1:y),col="blue")
###########################################################################
###########################################################################
############################
##ANALYSIS OF SIMULATED DATA
############################
##Simulated harvest numbers of a cohort harvested over 30 years (including
random noise)
pop<-N.true # True population size
cohort<-C # harvest matrix
R<-nrow(cohort)
L<-ncol(cohort)
h<-matrix(rep(Q,R),ncol=a,byrow=T) # estimated harvest rates
S<-matrix(rep(P[-1],R),ncol=a-1,byrow=T) # survival rates
##############################################
##Function to calculate the cell probabilities
##############################################
predprob <- function(S=S,h=h) {
R<-nrow(cohort)
L<-ncol(cohort)
p <- matrix(rep(0,(R)*(L)),ncol=L) # matrix of cell probabilities
p[R,1]<-h[R,1]
p[1,L]<-h[1,L]
ay<-rep(0,L+R+1) # vector of last cell probabilities ->
1-sum(rest));Gove(2002)
ay[L+R]<-1-h[1,L]
ay[2]<-1-h[R,1]
index<-3 # ay[1]=Null, ay[2]=1-h[R,1],ay[L+R]=1-h[1,L],
ay[L+R+1]=Null
for (i in -(R-2):(L-2)){ # Calculating the cell prob after Gove (2002)
p[row(p) == col(p) - i] <-
odiag(h,i)*c(1,cumprod((1-odiag(h[1:(R-1),1:(L-1)],i))
*odiag(S[1:(R-1),],i)))
ay[index]<-1-sum(odiag(p[1:R,],i))
index<-index+1
}
p<-rbind(p,ay[1:L])
p<-cbind(p,ay[length(ay):(L+1)])
return(p)
}
predprob(S,h) #test the function
####################################
##Function to calculate the multinomial ->from Ben Bolker
####################################
## multinom WITHOUT ROUNDING & without error-checking
dmnom <- function(x,prob,log=FALSE) {
r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
if (log) r else exp(r)
}
###########################
## function to be optimized
###########################
mfun <- function(logN,S=S,h=h,cohort=cohort) {
L<-ncol(cohort)
R<-nrow(cohort)
d<-matrix(rep(NA,(R+1)*(L+1)),ncol=L+1) # last row and last col of the
matrix
d[1:R,1:L]<-log(cohort) # are the values to be optimized->Log
tranformed data!
index<-1
index1<-(R+L)
for (i in -(R-1):(L-1)){ # values for the top row
if (i<= -(R-L) ){
d[(R+1),(index+1)]<-log(exp(logN[index])-sum(odiag(cohort,i)))} else
if (i> -(R-L)& i<1 ){
d[index1,(L+1)]<-log(exp(logN[index])-sum(odiag(cohort,i)))} else
if (i>0){
d[index1,(L+1)]<-log(exp(logN[index])-sum(odiag(cohort,i)))}
index<-index+1
index1<-index1-1
}
pp<-predprob(S,h)
lhood<-0 # The Likelihood function -> to be maximized
for (i in -(R-1):(L-1)){
lhood<-lhood+(-dmnom(exp(odiag(d,i)),prob=odiag(pp,i),log=TRUE))
}
return(lhood)
}
logN<-log(rep(1000,(R+L-1))) # Test the function with test-values
mfun(logN,S,h,cohort)
##########################################################################################
##########################################################################################
######################################
##Starting values for the optimization
######################################
N<-rep(500,(R+L-1)) # optimization,->N-x1-x2-x3
svec = log(N) # transform to avoid constraints (x>0)
names(svec) <- paste("logN",1:(R+L-1),sep="")
parnames(mfun) <- names(svec)
######################################
##Lower bounds for the L-BFGS-B-method
######################################
index<-1
lower<-rep(0,(R+L-1))
for (i in -(R-1):(L-1)){
lower[index]<-log(sum(odiag(cohort,i)))
index<-index+1
}
lower<-lower+0.001
names(lower)<-names(svec)
exp(lower) # check lower bounds
#########################
##Optimization with mle2
#########################
## use bounded optimization
m1 = mle2(mfun,start=svec,
method="L-BFGS-B",lower=lower,upper=10, # lower=must be larger than
"sum(odiag(x))"<-if smaller NANs are produced
vecpar=TRUE,
data=list(S=S,h=h,cohort=cohort))
coef(m1)
Nest<-exp(coef(m1)) # estimated cohort sizes for row and colum 1
############################################################################################
############################################################################################
#####################
##Analyse the Goodness of fit
#####################
Nya<-matrix(rep(0,length(cohort)),ncol=L) # project the N estimates into a
matrix
Nya[1,1:L]<-Nest[R:length(Nest)]
Nya[2:R,1]<-rev(Nest[1:(R-1)])
stot<-(1-Q[1:(a-1)])*P[2:a] # derived survival -> product of (1-h(i)) amd
s(i)
smat<-matrix(rep(stot,R),ncol=L-1,byrow=T)
for (i in 1:ncol(smat)){ # project the N estimates trough time
Nya[2:R,i+1]<-Nya[1:(R-1),i]*smat[1:(R-1),i]
}
#######################
##Expected harvest values compared with observed harvest numbers
#######################
exp<-Nya*h
obs<-cohort
gof<-((obs-exp)^2)/exp # Goodness of fit for the single
estimates # Overall GOF statistic
gof.stat<-sum(gof) # GOF statistic
p.value<-1-pchisq(gof.stat,(length(obs)-1)) # p-value with appropriate
degrees of freedom
p.value
plot(obs~exp,main="Observed harvest numbers vs expected from the
model",xlim=c(0,200),ylim=c(0,200))
abline(0,1)
##################
##Plots
##################
##Plot true population trend <-> compare with estimated trend
plot(rowSums(N.true)~seq(1:y),ylim=c(0,5000),col="red",pch=17,main="True vs
estimated population trend")
lines(rowSums(N.true)~seq(1:y),col="red")
points(rowSums(Nya)~seq(1:y),ylim=c(0,5000),col="blue",pch=16)
lines(rowSums(Nya)~seq(1:y),col="blue")
###############################################################
###############################################################
##########
##END#####
##########
Thanks a lot for the help!!!!
Benedikt
On Tue, 9 Mar 2010 15:34:09 +0000 (UTC)
Ben Bolker <bolker at ufl.edu> wrote:
> Benedikt Gehr <benedikt.gehr <at> ieu.uzh.ch> writes:
>
>>
>> Hi there
>>
>> I am using mle2 for a multinomial likelihood optimization problem. My
>> function works fine when I'm using simulated data, however my cell
>> probabilities of the true data for the multinomial likelihood are
>> sometimes very small (in some cases <0.00...) and the estimated point
>> estimates fit the true vlaues quite poorly. Is there a way how to handle
>> near zero probabilities in maximum likelihood optimization?
>>
>
> Hard to say without more detail. Can you send a reproducible
> example (your data, or a small subset of your data, or
> some way of simulating the data that *does* create the problem)?
>
> Since you're using log-likelihoods already (within mle2) the
> problem is unlikely (?) to be numerical -- R doesn't have any
> problem with very large negative log-likelihoods. Do your
> likelihood profiles look reasonable? If so then the problem
> is more likely that your model doesn't fit the data well than
> that you are having convergence problems.
>
> Have you considered the possibility of overdispersion
> (non-homogeneity/non-independence), e.g. via a Dirichlet-multinomial
> model?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list