[R] Fitting Mixture distributions

Aanchal Sharma aanchalsharma833 at gmail.com
Wed Sep 14 00:46:58 CEST 2016


Yes, I mentioned it wrong , I increased the value. This did not help
either. what helped is removing some samples which had zero (close to zero)
values. So its working fine for this error.

But there is another problem.
For one of the genes it says throws following error:

iteration = 1  log-lik diff = NaN  log-lik = NaN
Error in while (diff > epsilon && iter < maxit) { :
  missing value where TRUE/FALSE needed

Seems like EM is not able to calculate log-lik value (NaN) at the first
iteration itself. any idea why that can happen?
It works fine for the other genes in the loop. Tried looking for difference
in the inputs, but could not come up with anything striking.
Thanks for consistent inputs.

On Mon, Sep 12, 2016 at 8:18 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

> Do you mean "increase the convergence value." Decreasing it should
> make it harder to converge (I believe, depending on exactly how
> "convergence vaue" is defined,  so doublecheck.)
>
> -- Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 12, 2016 at 4:40 PM, Aanchal Sharma
> <aanchalsharma833 at gmail.com> wrote:
> > Thanks for the reply.
> >
> > I have another related issue with Gamma mixture model. here is the
> > description:
> >
> > I am trying to fit a 2 component gamma mixture model to my data (residual
> > values obtained after running Generalized Linear Model), using following
> > command (part of the code):
> >
> >  expr_mix_gamma <- gammamixEM(expr_glm_residuals, lambda = c(0.75,0.25),
> k =
> > 2, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = TRUE)
> >
> > The code runs for multiple gene files (in loop). it runs fine for some
> files
> > whereas for others it throws following error:
> >
> >     Error in gammamixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k =
> 2,
> > : Try different number of components?
> >
> > I tried increasing iterations and decreasing the convergence value, but
> that
> > doesn't seem to work. Is there anything else that I can try?
> > Thanks
> >
> >
> > On Thu, Sep 8, 2016 at 8:38 AM, Martin Maechler <
> maechler at stat.math.ethz.ch>
> > wrote:
> >>
> >> >>>>> Bert Gunter <bgunter.4567 at gmail.com>
> >> >>>>>     on Wed, 7 Sep 2016 23:47:40 -0700 writes:
> >>
> >>     > "please suggest what can I do to resolve this
> >>     > issue."
> >>
> >>     > Fitting normal mixtures can be difficult, and sometime the
> >>     > optimization algorithm (EM) will get stuck with very slow
> >> convergence.
> >>     > Presumably there are options in the package to either increase the
> >> max
> >>     > number of steps before giving up or make the convergence criteria
> >> less
> >>     > sensitive. The former will increase the run time and the latter
> will
> >>     > reduce the optimality (possibly leaving you farther from the true
> >>     > optimum). So you should look into changing these as you think
> >>     > appropriate.
> >>
> >> I'm jumping in late, without having read everything preceding.
> >>
> >> One of the last messages seemed to indicate that you are looking
> >> at mixtures of *one*-dimensional gaussians.
> >>
> >> If this is the case, I strongly recommend looking at (my) CRAN
> >> package 'nor1mix' (the "1" is for "*one*-dimensional).
> >>
> >> For a while now that small package is providing an alternative
> >> to the EM, namely direct MLE, simply using optim(<likelihood>) where the
> >> likelihood uses a somewhat smart parametrization.
> >>
> >> Of course, *as the EM*, this also depends on the starting value,
> >> but my (limited) experience has been that
> >>   nor1mix::norMixMLE()
> >> works considerably faster and more reliable than the EM (which I
> >> also provide as    nor1mix::norMixEM() .
> >>
> >> Apropos 'starting value': The help page shows how to use
> >> kmeans() for "somewhat" reliable starts; alternatively, I'd
> >> recommend using cluster::pam() to get a start there.
> >>
> >> I'm glad to hear about experiences using these / comparing
> >> these with other approaches.
> >>
> >> Martin
> >>
> >>
> >> --
> >> Martin Maechler,
> >> ETH Zurich
> >>
> >>
> >>     > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
> >>     > <aanchalsharma833 at gmail.com> wrote:
> >>     >> Hi Simon
> >>     >>
> >>     >> I am facing same problem as described above. i am trying to fit
> >> gaussian
> >>     >> mixture model to my data using normalmixEM. I am running a
> Rscript
> >> which
> >>     >> has this function running as part of it for about 17000 datasets
> >> (in loop).
> >>     >> The script runs fine for some datasets, but it terminates when it
> >>     >> encounters one dataset with the following error:
> >>     >>
> >>     >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75,
> 0.25), k
> >> = 2,  :
> >>     >> Too many tries!
> >>     >>
> >>     >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals,
> >> lambda =
> >>     >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000,
> >> maxrestarts=200, verb
> >>     >> = TRUE))
> >>     >> (expr_glm_residuals is my dataset which has residual values for
> >> different
> >>     >> samples)
> >>     >>
> >>     >> It is suggested that one should define the mu and sigma in the
> >> command by
> >>     >> looking at your dataset. But in my case there are many datasets
> and
> >> it will
> >>     >> keep on changing every time. please suggest what can I do to
> >> resolve this
> >>     >> issue.
> >>     >>
> >>     >> Regards
> >>     >> Anchal
> >>     >>
> >>     >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
> >>     >>>
> >>     >>> Hi Tjun Kiat Teo,
> >>     >>>
> >>     >>> you try to fit a Normal mixture to some data. The Normal mixture
> >> is very
> >>     >>> delicate when it comes to parameter search: If the variance gets
> >> closer and
> >>     >>> closer to zero, the log Likelihood becomes larger and larger for
> >> any values
> >>     >>> of the remaining parameters. Furthermore for the EM algorithm it
> >> is known,
> >>     >>> that it takes sometimes very long until convergence is reached.
> >>     >>>
> >>     >>> Try the following:
> >>     >>>
> >>     >>> Use as starting values for the component parameters:
> >>     >>>
> >>     >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data,
> na.rm =
> >> TRUE) *
> >>     >>> runif(K)
> >>     >>>
> >>     >>> For the weights just use either 1/K or the R cluster function
> with
> >> K
> >>     >>> clusters
> >>     >>>
> >>     >>> Here K is the number of components. Further enlarge the maximum
> >> number of
> >>     >>> iterations. What you could also try is to randomize start
> >> parameters and
> >>     >>> run an SEM (Stochastic EM). In my opinion the better method is
> in
> >> this case
> >>     >>> a Bayesian method: MCMC.
> >>     >>>
> >>     >>>
> >>     >>> Best
> >>     >>>
> >>     >>> Simon
> >>     >>>
> >>     >>>
> >>     >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
> >>     >>> <javascript:>> wrote:
> >>     >>>
> >>     >>> > I was trying to use the normixEM in mixtools and I got this
> >> error
> >>     >>> message.
> >>     >>> >
> >>     >>> > And I got this error message
> >>     >>> >
> >>     >>> > One of the variances is going to zero;  trying new starting
> >> values.
> >>     >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too
> >> many
> >>     >>> tries!
> >>     >>> >
> >>     >>> > Are there any other packages for fitting mixture distributions
> >> ?
> >>     >>> >
> >>     >>> >
> >>     >>> > Tjun Kiat Teo
> >>     >>> >
> >>     >>> >         [[alternative HTML version deleted]]
> >>     >>> >
> >>     >>> > ______________________________________________
> >>     >>> > R-h... at r-project.org <javascript:> mailing list
> >>     >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>     >>> > PLEASE do read the posting guide
> >>     >>> http://www.R-project.org/posting-guide.html
> >>     >>> > and provide commented, minimal, self-contained, reproducible
> >> code.
> >>     >>>
> >>     >>> ______________________________________________
> >>     >>> R-h... at r-project.org <javascript:> mailing list
> >>     >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>     >>> PLEASE do read the posting guide
> >>     >>> http://www.R-project.org/posting-guide.html
> >>     >>> and provide commented, minimal, self-contained, reproducible
> code.
> >>     >>>
> >>     >> ______________________________________________
> >>     >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> >>     >> https://stat.ethz.ch/mailman/listinfo/r-help
> >>     >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>     >> and provide commented, minimal, self-contained, reproducible
> code.
> >>
> >>     > ______________________________________________
> >>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>     > https://stat.ethz.ch/mailman/listinfo/r-help
> >>     > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>     > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> > --
> > Anchal Sharma, PhD
> > Postdoctoral Fellow
> > 195, Little Albany street,
> > Cancer Institute of New Jersey
> > Rutgers University
> > NJ-08901
>



-- 
Anchal Sharma, PhD
Postdoctoral Fellow
195, Little Albany street,
Cancer Institute of New Jersey
Rutgers University
NJ-08901

	[[alternative HTML version deleted]]



More information about the R-help mailing list