<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Re: [R] fitting a mixture of distributions with optim and max log likelihood ? Clarifications </TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Hi-</FONT>
<BR><FONT SIZE=2> Professor Ripley raised some questions on my first email:</FONT>
<BR><FONT SIZE=2> Are you on XP?</FONT>
<BR><FONT SIZE=2> I am on Release candidate 1 from MSDN professional subscriptions, actually.</FONT>
<BR><FONT SIZE=2>Anyone (although perhaps only in US) can</FONT>
<BR><FONT SIZE=2> get about the same thing from the MicroSoft Preview program for about $20</FONT>
<BR><FONT SIZE=2>US. (This release is time limited). cygwin works</FONT>
<BR><FONT SIZE=2> for me so far but I only use the shell, shell scripts, awk and sed on a</FONT>
<BR><FONT SIZE=2>regular basis (gcc and g77 are okay as far as I have used</FONT>
<BR><FONT SIZE=2>them) and the cygwin installer happly updated my windows XP verison</FONT>
</P>
<P><FONT SIZE=2>What were warnings?</FONT>
<BR><FONT SIZE=2>lWarning messages:</FONT>
<BR><FONT SIZE=2>1: NaNs produced in: log(x)</FONT>
<BR><FONT SIZE=2>...</FONT>
<BR><FONT SIZE=2>22: NaNs produced in: log(x)</FONT>
</P>
<P><FONT SIZE=2>I really want is to fit mixtures of 2 or 3 lognormal distributions that are</FONT>
<BR><FONT SIZE=2>discrete to the nearest 0.001 (gold assay</FONT>
<BR><FONT SIZE=2>values in ounces/ton) with both a normal error of about + or - 0.003 and a</FONT>
<BR><FONT SIZE=2>proportional error of about + or - 5%</FONT>
<BR><FONT SIZE=2>constrained to be non negative so I think analytical derivatives are out the</FONT>
<BR><FONT SIZE=2>window. </FONT>
</P>
<P><FONT SIZE=2>Could you point me to a reference on log-sds?</FONT>
</P>
<P><FONT SIZE=2>thanx</FONT>
</P>
<P><FONT SIZE=2>bob</FONT>
</P>
<P><FONT SIZE=2>Thread follows:</FONT>
</P>
<BR>
<BR>
<BR>
<P><FONT SIZE=2>>>> Prof Brian D Ripley 08/28/01 03:10PM >>></FONT>
<BR><FONT SIZE=2>On Tue, 28 Aug 2001, Bob Sandefur wrote:</FONT>
</P>
<P><FONT SIZE=2>> hi</FONT>
<BR><FONT SIZE=2>> Suppose I have a mixture of 2 distributions generated by</FONT>
</P>
<P><FONT SIZE=2>This is well-known difficult optimization problem with many local maxima.</FONT>
<BR><FONT SIZE=2>I think in particular that your test example has too broad a second</FONT>
<BR><FONT SIZE=2>normal for this to work well.</FONT>
</P>
<P><FONT SIZE=2>The default method for optim() is slow, not very precise but very reliable.</FONT>
<BR><FONT SIZE=2>I very rarely use it. With a problem like this where derivatives are</FONT>
<BR><FONT SIZE=2>readily available I would BFGS with analytical derivatives and multiple</FONT>
<BR><FONT SIZE=2>starting points to check for local maxima.</FONT>
</P>
<P><FONT SIZE=2>BTw, what are those warnings? You actually have a constrained</FONT>
<BR><FONT SIZE=2>space for the optimization, and I would have worked with log-sds</FONT>
<BR><FONT SIZE=2>to avoid that.</FONT>
</P>
<P><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> rtwonormals <- function(npnt,m1,s1,m2,s2,p2){</FONT>
<BR><FONT SIZE=2>> rv<-vector(npnt,mode="numeric")</FONT>
<BR><FONT SIZE=2>> for( i in seq(1:npnt)){</FONT>
<BR><FONT SIZE=2>> if(runif(1,0,1)<=p2){</FONT>
<BR><FONT SIZE=2>> rv[i]<-rnorm(1,m2,s2)</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>> else{</FONT>
<BR><FONT SIZE=2>> rv[i]<-rnorm(1,m1,s1)</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>> return(rv)</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>> x <- rtwonormals(50000,0,100,500,500,0.05)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> #and I try to fit these with (based on thread: [R] Estimating Weibull</FONT>
<BR><FONT SIZE=2>Distribution Parameters - very basic question)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> loglike<-function(p)</FONT>
<BR><FONT SIZE=2>-2*sum(log((1-p[5])*dnorm(x,p[1],p[2])+p[5]*dnorm(x,p[3],p[4])))</FONT>
<BR><FONT SIZE=2>> optim(c(-20,150,400,600,.035),loglike)</FONT>
<BR><FONT SIZE=2>> optim(c(20,70,600,400,.095),loglike)</FONT>
<BR><FONT SIZE=2>> optim(c(0,100,500,500,.05),loglike)</FONT>
<BR><FONT SIZE=2>> optim(c(-20,150,400,600,.035),loglike)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> # three different starting values (1 and 4 the same to check reproducablity)</FONT>
<BR><FONT SIZE=2>I get:</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Version 1.3.0 (2001-06-22) on windoze XP</FONT>
</P>
<P><FONT SIZE=2>Interesting. I thought XP was to be released on Oct 25, but there were</FONT>
<BR><FONT SIZE=2>rumours that MicroSoft had changed this. Do you really have XP, or</FONT>
<BR><FONT SIZE=2>a preview?</FONT>
</P>
<P><FONT SIZE=2>If this really does run under the real XP, we should add that to the</FONT>
<BR><FONT SIZE=2>readme etc.</FONT>
</P>
<P><FONT SIZE=2>Quite a few things, including our Cygwin tools, do not run under XP,</FONT>
<BR><FONT SIZE=2>allegedly.</FONT>
</P>
<P><FONT SIZE=2>> > optim(c(-20,150,400,600,.035),loglike)</FONT>
<BR><FONT SIZE=2>> $par</FONT>
<BR><FONT SIZE=2>> [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778</FONT>
<BR><FONT SIZE=2>> $value</FONT>
<BR><FONT SIZE=2>> [1] 622843.1</FONT>
<BR><FONT SIZE=2>> $counts</FONT>
<BR><FONT SIZE=2>> function gradient</FONT>
<BR><FONT SIZE=2>> 493 NA</FONT>
<BR><FONT SIZE=2>> $convergence</FONT>
<BR><FONT SIZE=2>> [1] 0</FONT>
<BR><FONT SIZE=2>> $message</FONT>
<BR><FONT SIZE=2>> NULL</FONT>
<BR><FONT SIZE=2>> There were 22 warnings (use warnings() to see them)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> > optim(c(20,70,600,400,.095),loglike)</FONT>
<BR><FONT SIZE=2>> $par</FONT>
<BR><FONT SIZE=2>> [1] 0.62742812 100.15891023 533.25825184 514.63882147 0.04670099</FONT>
<BR><FONT SIZE=2>> $value</FONT>
<BR><FONT SIZE=2>> [1] 622692.7</FONT>
<BR><FONT SIZE=2>> $couns</FONT>
<BR><FONT SIZE=2>> function gradient</FONT>
<BR><FONT SIZE=2>> 501 NA</FONT>
<BR><FONT SIZE=2>> $convergence</FONT>
<BR><FONT SIZE=2>> [1] 1 < OPPS</FONT>
<BR><FONT SIZE=2>> $message</FONT>
<BR><FONT SIZE=2>> NULL</FONT>
<BR><FONT SIZE=2>> There were 22 warnings (use warnings() to see them)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> > optim(c(0,100,500,500,.05),loglike)</FONT>
<BR><FONT SIZE=2>> $par</FONT>
<BR><FONT SIZE=2>> [1] 0.56254342 100.03881574 499.47434961 505.59785487 0.04805347</FONT>
<BR><FONT SIZE=2>> $value</FONT>
<BR><FONT SIZE=2>> [1] 622685</FONT>
<BR><FONT SIZE=2>> $counts</FONT>
<BR><FONT SIZE=2>> function gradient</FONT>
<BR><FONT SIZE=2>> 109 NA</FONT>
<BR><FONT SIZE=2>> $convergence</FONT>
<BR><FONT SIZE=2>> [1] 0</FONT>
<BR><FONT SIZE=2>> $message</FONT>
<BR><FONT SIZE=2>> NULL</FONT>
<BR><FONT SIZE=2>> There were 21 warnings (use warnings() to see them)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> > optim(c(-20,150,400,600,.035),loglike)</FONT>
<BR><FONT SIZE=2>> $par</FONT>
<BR><FONT SIZE=2>> [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778</FONT>
<BR><FONT SIZE=2>> $value</FONT>
<BR><FONT SIZE=2>> [1] 622843.1</FONT>
<BR><FONT SIZE=2>> $counts</FONT>
<BR><FONT SIZE=2>> function gradient</FONT>
<BR><FONT SIZE=2>> 493 NA</FONT>
<BR><FONT SIZE=2>> $convergence</FONT>
<BR><FONT SIZE=2>> [1] 0</FONT>
<BR><FONT SIZE=2>> $message</FONT>
<BR><FONT SIZE=2>> NULL</FONT>
<BR><FONT SIZE=2>> There were 22 warnings (use warnings() to see them)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Questions:</FONT>
<BR><FONT SIZE=2>> 1) Did I mess up anything in the formulae?</FONT>
<BR><FONT SIZE=2>> 2) Any suggestions for converging to the same value?</FONT>
<BR><FONT SIZE=2>> 3) Any suggestions for other methods to get means, stddevs and proportions</FONT>
<BR><FONT SIZE=2>of the mixture of distributions?</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Thanx</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> Robert (Bob) L Sandefur</FONT>
<BR><FONT SIZE=2>> Principal Geostatistician</FONT>
<BR><FONT SIZE=2>> Pincock Allen & Holt (A Hart Crowser Company)</FONT>
<BR><FONT SIZE=2>> International Minerals Consultants</FONT>
<BR><FONT SIZE=2>> 274 Union Suite 200</FONT>
<BR><FONT SIZE=2>> Lakewood CO 80228</FONT>
<BR><FONT SIZE=2>> USA</FONT>
<BR><FONT SIZE=2>> 303 914-4467 v</FONT>
<BR><FONT SIZE=2>> 303 987-8907 f</FONT>
<BR><FONT SIZE=2>> rls@pincock.com </FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> ~</FONT>
<BR><FONT SIZE=2>> ~</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-</FONT>
<BR><FONT SIZE=2>> r-help mailing list -- Read <A HREF="http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html">http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html</A> </FONT>
<BR><FONT SIZE=2>> Send "info", "help", or "[un]subscribe"</FONT>
<BR><FONT SIZE=2>> (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch </FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._</FONT>
<BR><FONT SIZE=2>></FONT>
</P>
<P><FONT SIZE=2>-- </FONT>
<BR><FONT SIZE=2>Brian D. Ripley, ripley@stats.ox.ac.uk </FONT>
<BR><FONT SIZE=2>Professor of Applied Statistics, <A HREF="http://www.stats.ox.ac.uk/~ripley/">http://www.stats.ox.ac.uk/~ripley/</A> </FONT>
<BR><FONT SIZE=2>University of Oxford, Tel: +44 1865 272861 (self)</FONT>
<BR><FONT SIZE=2>1 South Parks Road, +44 1865 272860 (secr)</FONT>
<BR><FONT SIZE=2>Oxford OX1 3TG, UK Fax: +44 1865 272595</FONT>
</P>
</BODY>
</HTML>