[R] mixture distribution analisys

Fri Apr 11 01:53:21 CEST 2008

Antonio Olinto wrote:
> Hello,
> 
> I'm analyzing some fish length-frequency data to access relative age and growth
> information and I would like to do something different from FiSAT / ELEFAN analysis.
> 
> I found a package named MIX in http://www.math.mcmaster.ca/peter/mix/mix.html
> but it's compiled for R 2.0.0
> 
> So I have two questions:
> Is there any problem to run this package in R 2.7.0? - probably yes …
> And, is there any other package in R that can be used to analyze and to separate
> mixture distributions?
> 
> Thanks for any help. Best regards,
> 
> Antonio Olinto
> Sao Paulo Fisheries Institute
> Brazil

This problem is not too difficult. Maybe you can write your own code.
Say, you enter the number of fish you measured, n, and a two-column 
dataframe with the mid point of the length class in the first column 
(call it l) and the observed relative frequency of each length class in 
the second column (call it obs_prob). Then using the multinomial 
likelihood for the number of fish in each length class as the random 
variable (the approach in Peter Macdonald's Mix), minimize
log_likelihood=-n*sum(elementwise product(obs_prob,log(pred_prob)))
where
pred_prob=(p1*0.3989423/s1)*exp(-0.5*square((l-m1)/s1))
     +(p2*0.3989423/s2)*exp(-0.5*square((l-m2)/s2))
     +(p3*0.3989423/s3)*exp(-0.5*square((l-m3)/s3))
where s1, s2, s3, m1, m2, m3, p1 and p2 (p3=1-p1+p2) are the parameters.
Do you know your number of components (cohorts) in the mixture?
In the model above it is three. If you don't know it, try several models 
with different number of components and the AIC may let you know which 
one is the best working model. Don't use the likelihood ratio test as 
one of the p parameters is on the boundary of parameter space if the 
null is true.
Also, you should bound the parameters (m1<m2<m3, 0<p1<1, 0<p2<1, and 
establish the condition p3=1-p1+p2.
I wrote ADMB code to do this if you want it. You can translate it to R.
Below you can find some example code to do the graphics.
R.


cont <- 
c(4,15,32,44,62,69,61,99,114,119,106,108,89,88,95,99,91,84,137,190,224,297,368,484,566,629,446,349,377,405,438,367,215,115,36,10,4,1,0,0,1)
tot <- sum(cont)
rel.freq <- rep(cont,each=10)/tot/10
lect <- seq(9.1,50,0.1)
prop1 <- 0.0219271
prop2 <- 0.121317
prop3 <- 0.0328074
prop4 <- 0.315534
prop5 <- 0.203071
prop6 <- 0.3053439
sigma1 <- 1.50760
sigma2 <- 2.82352
sigma3 <- 1.40698
sigma4 <- 3.00063
sigma5 <- 1.41955
sigma6 <- 1.99940
media1 <- 13.4148
media2 <- 19.2206
media3 <- 24.5748
media4 <- 31.9498
media5 <- 34.6998
media6 <- 39.7016
prot1<-(1/10)*(prop1*(1/sqrt(2*pi))/sigma1)*exp(-0.5*((lect-(media1-0.5))/sigma1)^2)
prot2<-(1/10)*(prop2*(1/sqrt(2*pi))/sigma2)*exp(-0.5*((lect-(media2-0.5))/sigma2)^2)
prot3<-(1/10)*(prop3*(1/sqrt(2*pi))/sigma3)*exp(-0.5*((lect-(media3-0.5))/sigma3)^2)
prot4<-(1/10)*(prop4*(1/sqrt(2*pi))/sigma4)*exp(-0.5*((lect-(media4-0.5))/sigma4)^2)
prot5<-(1/10)*(prop5*(1/sqrt(2*pi))/sigma5)*exp(-0.5*((lect-(media5-0.5))/sigma5)^2)
prot6<-(1/10)*(prop6*(1/sqrt(2*pi))/sigma6)*exp(-0.5*((lect-(media6-0.5))/sigma6)^2)
prot.tot<-prot1+prot2+prot3+prot4+prot5+prot6
plot(lect,rel.freq,type="l",lwd=3,xlab="",ylab="",ylim=c(0,0.01))
lines(lect,prot1)
lines(lect,prot2)
lines(lect,prot3)
lines(lect,prot4)
lines(lect,prot5)
lines(lect,prot6)
lines(lect,prot.tot,lwd=2)