[R] Fitting Mixture distributions

Tue Sep 13 01:40:15 CEST 2016

Thanks for the reply.

I have another related issue with Gamma mixture model. here is the
description:

I am trying to fit a 2 component gamma mixture model to my data (residual
values obtained after running Generalized Linear Model), using following
command (part of the code):

 expr_mix_gamma <- gammamixEM(expr_glm_residuals, lambda = c(0.75,0.25), k
= 2, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = TRUE)

The code runs for multiple gene files (in loop). it runs fine for some
files whereas for others it throws following error:

    Error in gammamixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,
 : Try different number of components?

I tried increasing iterations and decreasing the convergence value, but
that doesn't seem to work. Is there anything else that I can try?
Thanks

On Thu, Sep 8, 2016 at 8:38 AM, Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

> >>>>> Bert Gunter <bgunter.4567 at gmail.com>
> >>>>>     on Wed, 7 Sep 2016 23:47:40 -0700 writes:
>
>     > "please suggest what can I do to resolve this
>     > issue."
>
>     > Fitting normal mixtures can be difficult, and sometime the
>     > optimization algorithm (EM) will get stuck with very slow
> convergence.
>     > Presumably there are options in the package to either increase the
> max
>     > number of steps before giving up or make the convergence criteria
> less
>     > sensitive. The former will increase the run time and the latter will
>     > reduce the optimality (possibly leaving you farther from the true
>     > optimum). So you should look into changing these as you think
>     > appropriate.
>
> I'm jumping in late, without having read everything preceding.
>
> One of the last messages seemed to indicate that you are looking
> at mixtures of *one*-dimensional gaussians.
>
> If this is the case, I strongly recommend looking at (my) CRAN
> package 'nor1mix' (the "1" is for "*one*-dimensional).
>
> For a while now that small package is providing an alternative
> to the EM, namely direct MLE, simply using optim(<likelihood>) where the
> likelihood uses a somewhat smart parametrization.
>
> Of course, *as the EM*, this also depends on the starting value,
> but my (limited) experience has been that
>   nor1mix::norMixMLE()
> works considerably faster and more reliable than the EM (which I
> also provide as    nor1mix::norMixEM() .
>
> Apropos 'starting value': The help page shows how to use
> kmeans() for "somewhat" reliable starts; alternatively, I'd
> recommend using cluster::pam() to get a start there.
>
> I'm glad to hear about experiences using these / comparing
> these with other approaches.
>
> Martin
>
>
> --
> Martin Maechler,
> ETH Zurich
>
>
>     > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
>     > <aanchalsharma833 at gmail.com> wrote:
>     >> Hi Simon
>     >>
>     >> I am facing same problem as described above. i am trying to fit
> gaussian
>     >> mixture model to my data using normalmixEM. I am running a Rscript
> which
>     >> has this function running as part of it for about 17000 datasets
> (in loop).
>     >> The script runs fine for some datasets, but it terminates when it
>     >> encounters one dataset with the following error:
>     >>
>     >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k
> = 2,  :
>     >> Too many tries!
>     >>
>     >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals,
> lambda =
>     >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000,
> maxrestarts=200, verb
>     >> = TRUE))
>     >> (expr_glm_residuals is my dataset which has residual values for
> different
>     >> samples)
>     >>
>     >> It is suggested that one should define the mu and sigma in the
> command by
>     >> looking at your dataset. But in my case there are many datasets and
> it will
>     >> keep on changing every time. please suggest what can I do to
> resolve this
>     >> issue.
>     >>
>     >> Regards
>     >> Anchal
>     >>
>     >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
>     >>>
>     >>> Hi Tjun Kiat Teo,
>     >>>
>     >>> you try to fit a Normal mixture to some data. The Normal mixture
> is very
>     >>> delicate when it comes to parameter search: If the variance gets
> closer and
>     >>> closer to zero, the log Likelihood becomes larger and larger for
> any values
>     >>> of the remaining parameters. Furthermore for the EM algorithm it
> is known,
>     >>> that it takes sometimes very long until convergence is reached.
>     >>>
>     >>> Try the following:
>     >>>
>     >>> Use as starting values for the component parameters:
>     >>>
>     >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm =
> TRUE) *
>     >>> runif(K)
>     >>>
>     >>> For the weights just use either 1/K or the R cluster function with
> K
>     >>> clusters
>     >>>
>     >>> Here K is the number of components. Further enlarge the maximum
> number of
>     >>> iterations. What you could also try is to randomize start
> parameters and
>     >>> run an SEM (Stochastic EM). In my opinion the better method is in
> this case
>     >>> a Bayesian method: MCMC.
>     >>>
>     >>>
>     >>> Best
>     >>>
>     >>> Simon
>     >>>
>     >>>
>     >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
>     >>> <javascript:>> wrote:
>     >>>
>     >>> > I was trying to use the normixEM in mixtools and I got this error
>     >>> message.
>     >>> >
>     >>> > And I got this error message
>     >>> >
>     >>> > One of the variances is going to zero;  trying new starting
> values.
>     >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too
> many
>     >>> tries!
>     >>> >
>     >>> > Are there any other packages for fitting mixture distributions  ?
>     >>> >
>     >>> >
>     >>> > Tjun Kiat Teo
>     >>> >
>     >>> >         [[alternative HTML version deleted]]
>     >>> >
>     >>> > ______________________________________________
>     >>> > R-h... at r-project.org <javascript:> mailing list
>     >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>     >>> > PLEASE do read the posting guide
>     >>> http://www.R-project.org/posting-guide.html
>     >>> > and provide commented, minimal, self-contained, reproducible
> code.
>     >>>
>     >>> ______________________________________________
>     >>> R-h... at r-project.org <javascript:> mailing list
>     >>> https://stat.ethz.ch/mailman/listinfo/r-help
>     >>> PLEASE do read the posting guide
>     >>> http://www.R-project.org/posting-guide.html
>     >>> and provide commented, minimal, self-contained, reproducible code.
>     >>>
>     >> ______________________________________________
>     >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     >> https://stat.ethz.ch/mailman/listinfo/r-help
>     >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
>     >> and provide commented, minimal, self-contained, reproducible code.
>
>     > ______________________________________________
>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>

-- 
Anchal Sharma, PhD
Postdoctoral Fellow
195, Little Albany street,
Cancer Institute of New Jersey
Rutgers University
NJ-08901

	[[alternative HTML version deleted]]