[R] Fitting Mixture distributions
Aanchal Sharma
aanchalsharma833 at gmail.com
Tue Sep 13 01:40:15 CEST 2016
Thanks for the reply.
I have another related issue with Gamma mixture model. here is the
description:
I am trying to fit a 2 component gamma mixture model to my data (residual
values obtained after running Generalized Linear Model), using following
command (part of the code):
expr_mix_gamma <- gammamixEM(expr_glm_residuals, lambda = c(0.75,0.25), k
= 2, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = TRUE)
The code runs for multiple gene files (in loop). it runs fine for some
files whereas for others it throws following error:
Error in gammamixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,
: Try different number of components?
I tried increasing iterations and decreasing the convergence value, but
that doesn't seem to work. Is there anything else that I can try?
Thanks
On Thu, Sep 8, 2016 at 8:38 AM, Martin Maechler <maechler at stat.math.ethz.ch>
wrote:
> >>>>> Bert Gunter <bgunter.4567 at gmail.com>
> >>>>> on Wed, 7 Sep 2016 23:47:40 -0700 writes:
>
> > "please suggest what can I do to resolve this
> > issue."
>
> > Fitting normal mixtures can be difficult, and sometime the
> > optimization algorithm (EM) will get stuck with very slow
> convergence.
> > Presumably there are options in the package to either increase the
> max
> > number of steps before giving up or make the convergence criteria
> less
> > sensitive. The former will increase the run time and the latter will
> > reduce the optimality (possibly leaving you farther from the true
> > optimum). So you should look into changing these as you think
> > appropriate.
>
> I'm jumping in late, without having read everything preceding.
>
> One of the last messages seemed to indicate that you are looking
> at mixtures of *one*-dimensional gaussians.
>
> If this is the case, I strongly recommend looking at (my) CRAN
> package 'nor1mix' (the "1" is for "*one*-dimensional).
>
> For a while now that small package is providing an alternative
> to the EM, namely direct MLE, simply using optim(<likelihood>) where the
> likelihood uses a somewhat smart parametrization.
>
> Of course, *as the EM*, this also depends on the starting value,
> but my (limited) experience has been that
> nor1mix::norMixMLE()
> works considerably faster and more reliable than the EM (which I
> also provide as nor1mix::norMixEM() .
>
> Apropos 'starting value': The help page shows how to use
> kmeans() for "somewhat" reliable starts; alternatively, I'd
> recommend using cluster::pam() to get a start there.
>
> I'm glad to hear about experiences using these / comparing
> these with other approaches.
>
> Martin
>
>
> --
> Martin Maechler,
> ETH Zurich
>
>
> > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
> > <aanchalsharma833 at gmail.com> wrote:
> >> Hi Simon
> >>
> >> I am facing same problem as described above. i am trying to fit
> gaussian
> >> mixture model to my data using normalmixEM. I am running a Rscript
> which
> >> has this function running as part of it for about 17000 datasets
> (in loop).
> >> The script runs fine for some datasets, but it terminates when it
> >> encounters one dataset with the following error:
> >>
> >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k
> = 2, :
> >> Too many tries!
> >>
> >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals,
> lambda =
> >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000,
> maxrestarts=200, verb
> >> = TRUE))
> >> (expr_glm_residuals is my dataset which has residual values for
> different
> >> samples)
> >>
> >> It is suggested that one should define the mu and sigma in the
> command by
> >> looking at your dataset. But in my case there are many datasets and
> it will
> >> keep on changing every time. please suggest what can I do to
> resolve this
> >> issue.
> >>
> >> Regards
> >> Anchal
> >>
> >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
> >>>
> >>> Hi Tjun Kiat Teo,
> >>>
> >>> you try to fit a Normal mixture to some data. The Normal mixture
> is very
> >>> delicate when it comes to parameter search: If the variance gets
> closer and
> >>> closer to zero, the log Likelihood becomes larger and larger for
> any values
> >>> of the remaining parameters. Furthermore for the EM algorithm it
> is known,
> >>> that it takes sometimes very long until convergence is reached.
> >>>
> >>> Try the following:
> >>>
> >>> Use as starting values for the component parameters:
> >>>
> >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm =
> TRUE) *
> >>> runif(K)
> >>>
> >>> For the weights just use either 1/K or the R cluster function with
> K
> >>> clusters
> >>>
> >>> Here K is the number of components. Further enlarge the maximum
> number of
> >>> iterations. What you could also try is to randomize start
> parameters and
> >>> run an SEM (Stochastic EM). In my opinion the better method is in
> this case
> >>> a Bayesian method: MCMC.
> >>>
> >>>
> >>> Best
> >>>
> >>> Simon
> >>>
> >>>
> >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
> >>> <javascript:>> wrote:
> >>>
> >>> > I was trying to use the normixEM in mixtools and I got this error
> >>> message.
> >>> >
> >>> > And I got this error message
> >>> >
> >>> > One of the variances is going to zero; trying new starting
> values.
> >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too
> many
> >>> tries!
> >>> >
> >>> > Are there any other packages for fitting mixture distributions ?
> >>> >
> >>> >
> >>> > Tjun Kiat Teo
> >>> >
> >>> > [[alternative HTML version deleted]]
> >>> >
> >>> > ______________________________________________
> >>> > R-h... at r-project.org <javascript:> mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible
> code.
> >>>
> >>> ______________________________________________
> >>> R-h... at r-project.org <javascript:> mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
--
Anchal Sharma, PhD
Postdoctoral Fellow
195, Little Albany street,
Cancer Institute of New Jersey
Rutgers University
NJ-08901
[[alternative HTML version deleted]]
More information about the R-help
mailing list