[R] Fitting Mixture distributions
Bert Gunter
bgunter.4567 at gmail.com
Tue Sep 13 02:18:56 CEST 2016
Do you mean "increase the convergence value." Decreasing it should
make it harder to converge (I believe, depending on exactly how
"convergence vaue" is defined, so doublecheck.)
-- Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Sep 12, 2016 at 4:40 PM, Aanchal Sharma
<aanchalsharma833 at gmail.com> wrote:
> Thanks for the reply.
>
> I have another related issue with Gamma mixture model. here is the
> description:
>
> I am trying to fit a 2 component gamma mixture model to my data (residual
> values obtained after running Generalized Linear Model), using following
> command (part of the code):
>
> expr_mix_gamma <- gammamixEM(expr_glm_residuals, lambda = c(0.75,0.25), k =
> 2, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = TRUE)
>
> The code runs for multiple gene files (in loop). it runs fine for some files
> whereas for others it throws following error:
>
> Error in gammamixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,
> : Try different number of components?
>
> I tried increasing iterations and decreasing the convergence value, but that
> doesn't seem to work. Is there anything else that I can try?
> Thanks
>
>
> On Thu, Sep 8, 2016 at 8:38 AM, Martin Maechler <maechler at stat.math.ethz.ch>
> wrote:
>>
>> >>>>> Bert Gunter <bgunter.4567 at gmail.com>
>> >>>>> on Wed, 7 Sep 2016 23:47:40 -0700 writes:
>>
>> > "please suggest what can I do to resolve this
>> > issue."
>>
>> > Fitting normal mixtures can be difficult, and sometime the
>> > optimization algorithm (EM) will get stuck with very slow
>> convergence.
>> > Presumably there are options in the package to either increase the
>> max
>> > number of steps before giving up or make the convergence criteria
>> less
>> > sensitive. The former will increase the run time and the latter will
>> > reduce the optimality (possibly leaving you farther from the true
>> > optimum). So you should look into changing these as you think
>> > appropriate.
>>
>> I'm jumping in late, without having read everything preceding.
>>
>> One of the last messages seemed to indicate that you are looking
>> at mixtures of *one*-dimensional gaussians.
>>
>> If this is the case, I strongly recommend looking at (my) CRAN
>> package 'nor1mix' (the "1" is for "*one*-dimensional).
>>
>> For a while now that small package is providing an alternative
>> to the EM, namely direct MLE, simply using optim(<likelihood>) where the
>> likelihood uses a somewhat smart parametrization.
>>
>> Of course, *as the EM*, this also depends on the starting value,
>> but my (limited) experience has been that
>> nor1mix::norMixMLE()
>> works considerably faster and more reliable than the EM (which I
>> also provide as nor1mix::norMixEM() .
>>
>> Apropos 'starting value': The help page shows how to use
>> kmeans() for "somewhat" reliable starts; alternatively, I'd
>> recommend using cluster::pam() to get a start there.
>>
>> I'm glad to hear about experiences using these / comparing
>> these with other approaches.
>>
>> Martin
>>
>>
>> --
>> Martin Maechler,
>> ETH Zurich
>>
>>
>> > On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
>> > <aanchalsharma833 at gmail.com> wrote:
>> >> Hi Simon
>> >>
>> >> I am facing same problem as described above. i am trying to fit
>> gaussian
>> >> mixture model to my data using normalmixEM. I am running a Rscript
>> which
>> >> has this function running as part of it for about 17000 datasets
>> (in loop).
>> >> The script runs fine for some datasets, but it terminates when it
>> >> encounters one dataset with the following error:
>> >>
>> >> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k
>> = 2, :
>> >> Too many tries!
>> >>
>> >> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals,
>> lambda =
>> >> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000,
>> maxrestarts=200, verb
>> >> = TRUE))
>> >> (expr_glm_residuals is my dataset which has residual values for
>> different
>> >> samples)
>> >>
>> >> It is suggested that one should define the mu and sigma in the
>> command by
>> >> looking at your dataset. But in my case there are many datasets and
>> it will
>> >> keep on changing every time. please suggest what can I do to
>> resolve this
>> >> issue.
>> >>
>> >> Regards
>> >> Anchal
>> >>
>> >> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
>> >>>
>> >>> Hi Tjun Kiat Teo,
>> >>>
>> >>> you try to fit a Normal mixture to some data. The Normal mixture
>> is very
>> >>> delicate when it comes to parameter search: If the variance gets
>> closer and
>> >>> closer to zero, the log Likelihood becomes larger and larger for
>> any values
>> >>> of the remaining parameters. Furthermore for the EM algorithm it
>> is known,
>> >>> that it takes sometimes very long until convergence is reached.
>> >>>
>> >>> Try the following:
>> >>>
>> >>> Use as starting values for the component parameters:
>> >>>
>> >>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm =
>> TRUE) *
>> >>> runif(K)
>> >>>
>> >>> For the weights just use either 1/K or the R cluster function with
>> K
>> >>> clusters
>> >>>
>> >>> Here K is the number of components. Further enlarge the maximum
>> number of
>> >>> iterations. What you could also try is to randomize start
>> parameters and
>> >>> run an SEM (Stochastic EM). In my opinion the better method is in
>> this case
>> >>> a Bayesian method: MCMC.
>> >>>
>> >>>
>> >>> Best
>> >>>
>> >>> Simon
>> >>>
>> >>>
>> >>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
>> >>> <javascript:>> wrote:
>> >>>
>> >>> > I was trying to use the normixEM in mixtools and I got this
>> error
>> >>> message.
>> >>> >
>> >>> > And I got this error message
>> >>> >
>> >>> > One of the variances is going to zero; trying new starting
>> values.
>> >>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too
>> many
>> >>> tries!
>> >>> >
>> >>> > Are there any other packages for fitting mixture distributions
>> ?
>> >>> >
>> >>> >
>> >>> > Tjun Kiat Teo
>> >>> >
>> >>> > [[alternative HTML version deleted]]
>> >>> >
>> >>> > ______________________________________________
>> >>> > R-h... at r-project.org <javascript:> mailing list
>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> > PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> > and provide commented, minimal, self-contained, reproducible
>> code.
>> >>>
>> >>> ______________________________________________
>> >>> R-h... at r-project.org <javascript:> mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>>
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Anchal Sharma, PhD
> Postdoctoral Fellow
> 195, Little Albany street,
> Cancer Institute of New Jersey
> Rutgers University
> NJ-08901
More information about the R-help
mailing list