[R] Fitting Mixture distributions

Bert Gunter bgunter.4567 at gmail.com
Tue Sep 13 02:18:56 CEST 2016


Do you mean "increase the convergence value." Decreasing it should
make it harder to converge (I believe, depending on exactly how
"convergence vaue" is defined,  so doublecheck.)

-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 12, 2016 at 4:40 PM, Aanchal Sharma
<aanchalsharma833 at gmail.com> wrote:
> Thanks for the reply.
>
> I have another related issue with Gamma mixture model. here is the
> description:
>
> I am trying to fit a 2 component gamma mixture model to my data (residual
> values obtained after running Generalized Linear Model), using following
> command (part of the code):
>
>  expr_mix_gamma <- gammamixEM(expr_glm_residuals, lambda = c(0.75,0.25), k =
> 2, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = TRUE)
>
> The code runs for multiple gene files (in loop). it runs fine for some files
> whereas for others it throws following error:
>
>     Error in gammamixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,
> : Try different number of components?
>
> I tried increasing iterations and decreasing the convergence value, but that
> doesn't seem to work. Is there anything else that I can try?
> Thanks
>
>
> On Thu, Sep 8, 2016 at 8:38 AM, Martin Maechler <maechler at stat.math.ethz.ch>
> wrote:
>>
>> >>>>> Bert Gunter <bgunter.4567 at gmail.com>
>> >>>>>     on Wed, 7 Sep 2016 23:47:40 -0700 writes:
>>
>>     > "please suggest what can I do to resolve this
>>     > issue."
>>
>>     > Fitting normal mixtures can be difficult, and sometime the
>>     > optimization algorithm (EM) will get stuck with very slow
>> convergence.
>>     > Presumably there are options in the package to either increase the
>> max
>>     > number of steps before giving up or make the convergence criteria
>> less
>>     > sensitive. The former will increase the run time and the latter will
>>     > reduce the optimality (possibly leaving you farther from the true
>>     > optimum). So you should look into changing these as you think
>>     > appropriate.
>>
>> I'm jumping in late, without having read everything preceding.
>>
>> One of the last messages seemed to indicate that you are looking
>> at mixtures of *one*-dimensional gaussians.
>>
>> If this is the case, I strongly recommend looking at (my) CRAN
>> package 'nor1mix' (the "1" is for "*one*-dimensional).
>>
>> For a while now that small package is providing an alternative
>> to the EM, namely direct MLE, simply using optim(<likelihood>) where the
>> likelihood uses a somewhat smart parametrization.
>>
>> Of course, *as the EM*, this also depends on the starting value,
>> but my (limited) experience has been that
>>   nor1mix::norMixMLE()
>> works considerably faster and more reliable than the EM (which I
>> also provide as    nor1mix::norMixEM() .
>>
>> Apropos 'starting value': The help page shows how to use
>> kmeans() for "somewhat" reliable starts; alternatively, I'd
>> recommend using cluster::pam() to get a start there.
>>
>> I'm glad to hear about experiences using these / comparing
>> these with other approaches.
>>
>> Martin
>>
>>
>> --
>> Martin Maechler,
>> ETH Zurich
>>
>>
>>     > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
>>     > <aanchalsharma833 at gmail.com> wrote:
>>     >> Hi Simon
>>     >>
>>     >> I am facing same problem as described above. i am trying to fit
>> gaussian
>>     >> mixture model to my data using normalmixEM. I am running a Rscript
>> which
>>     >> has this function running as part of it for about 17000 datasets
>> (in loop).
>>     >> The script runs fine for some datasets, but it terminates when it
>>     >> encounters one dataset with the following error:
>>     >>
>>     >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k
>> = 2,  :
>>     >> Too many tries!
>>     >>
>>     >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals,
>> lambda =
>>     >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000,
>> maxrestarts=200, verb
>>     >> = TRUE))
>>     >> (expr_glm_residuals is my dataset which has residual values for
>> different
>>     >> samples)
>>     >>
>>     >> It is suggested that one should define the mu and sigma in the
>> command by
>>     >> looking at your dataset. But in my case there are many datasets and
>> it will
>>     >> keep on changing every time. please suggest what can I do to
>> resolve this
>>     >> issue.
>>     >>
>>     >> Regards
>>     >> Anchal
>>     >>
>>     >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
>>     >>>
>>     >>> Hi Tjun Kiat Teo,
>>     >>>
>>     >>> you try to fit a Normal mixture to some data. The Normal mixture
>> is very
>>     >>> delicate when it comes to parameter search: If the variance gets
>> closer and
>>     >>> closer to zero, the log Likelihood becomes larger and larger for
>> any values
>>     >>> of the remaining parameters. Furthermore for the EM algorithm it
>> is known,
>>     >>> that it takes sometimes very long until convergence is reached.
>>     >>>
>>     >>> Try the following:
>>     >>>
>>     >>> Use as starting values for the component parameters:
>>     >>>
>>     >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm =
>> TRUE) *
>>     >>> runif(K)
>>     >>>
>>     >>> For the weights just use either 1/K or the R cluster function with
>> K
>>     >>> clusters
>>     >>>
>>     >>> Here K is the number of components. Further enlarge the maximum
>> number of
>>     >>> iterations. What you could also try is to randomize start
>> parameters and
>>     >>> run an SEM (Stochastic EM). In my opinion the better method is in
>> this case
>>     >>> a Bayesian method: MCMC.
>>     >>>
>>     >>>
>>     >>> Best
>>     >>>
>>     >>> Simon
>>     >>>
>>     >>>
>>     >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
>>     >>> <javascript:>> wrote:
>>     >>>
>>     >>> > I was trying to use the normixEM in mixtools and I got this
>> error
>>     >>> message.
>>     >>> >
>>     >>> > And I got this error message
>>     >>> >
>>     >>> > One of the variances is going to zero;  trying new starting
>> values.
>>     >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too
>> many
>>     >>> tries!
>>     >>> >
>>     >>> > Are there any other packages for fitting mixture distributions
>> ?
>>     >>> >
>>     >>> >
>>     >>> > Tjun Kiat Teo
>>     >>> >
>>     >>> >         [[alternative HTML version deleted]]
>>     >>> >
>>     >>> > ______________________________________________
>>     >>> > R-h... at r-project.org <javascript:> mailing list
>>     >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>     >>> > PLEASE do read the posting guide
>>     >>> http://www.R-project.org/posting-guide.html
>>     >>> > and provide commented, minimal, self-contained, reproducible
>> code.
>>     >>>
>>     >>> ______________________________________________
>>     >>> R-h... at r-project.org <javascript:> mailing list
>>     >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>     >>> PLEASE do read the posting guide
>>     >>> http://www.R-project.org/posting-guide.html
>>     >>> and provide commented, minimal, self-contained, reproducible code.
>>     >>>
>>     >> ______________________________________________
>>     >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>     >> https://stat.ethz.ch/mailman/listinfo/r-help
>>     >> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>     >> and provide commented, minimal, self-contained, reproducible code.
>>
>>     > ______________________________________________
>>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>     > https://stat.ethz.ch/mailman/listinfo/r-help
>>     > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>     > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Anchal Sharma, PhD
> Postdoctoral Fellow
> 195, Little Albany street,
> Cancer Institute of New Jersey
> Rutgers University
> NJ-08901



More information about the R-help mailing list