[R] Fitting Mixture distributions
Martin Maechler
maechler at stat.math.ethz.ch
Thu Sep 8 14:38:10 CEST 2016
>>>>> Bert Gunter <bgunter.4567 at gmail.com>
>>>>> on Wed, 7 Sep 2016 23:47:40 -0700 writes:
> "please suggest what can I do to resolve this
> issue."
> Fitting normal mixtures can be difficult, and sometime the
> optimization algorithm (EM) will get stuck with very slow convergence.
> Presumably there are options in the package to either increase the max
> number of steps before giving up or make the convergence criteria less
> sensitive. The former will increase the run time and the latter will
> reduce the optimality (possibly leaving you farther from the true
> optimum). So you should look into changing these as you think
> appropriate.
I'm jumping in late, without having read everything preceding.
One of the last messages seemed to indicate that you are looking
at mixtures of *one*-dimensional gaussians.
If this is the case, I strongly recommend looking at (my) CRAN
package 'nor1mix' (the "1" is for "*one*-dimensional).
For a while now that small package is providing an alternative
to the EM, namely direct MLE, simply using optim(<likelihood>) where the
likelihood uses a somewhat smart parametrization.
Of course, *as the EM*, this also depends on the starting value,
but my (limited) experience has been that
nor1mix::norMixMLE()
works considerably faster and more reliable than the EM (which I
also provide as nor1mix::norMixEM() .
Apropos 'starting value': The help page shows how to use
kmeans() for "somewhat" reliable starts; alternatively, I'd
recommend using cluster::pam() to get a start there.
I'm glad to hear about experiences using these / comparing
these with other approaches.
Martin
--
Martin Maechler,
ETH Zurich
> On Wed, Sep 7, 2016 at 3:51 PM, Aanchal Sharma
> <aanchalsharma833 at gmail.com> wrote:
>> Hi Simon
>>
>> I am facing same problem as described above. i am trying to fit gaussian
>> mixture model to my data using normalmixEM. I am running a Rscript which
>> has this function running as part of it for about 17000 datasets (in loop).
>> The script runs fine for some datasets, but it terminates when it
>> encounters one dataset with the following error:
>>
>> Error in normalmixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2, :
>> Too many tries!
>>
>> (command used: expr_mix_gau <- normalmixEM(expr_glm_residuals, lambda =
>> c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 10000, maxrestarts=200, verb
>> = TRUE))
>> (expr_glm_residuals is my dataset which has residual values for different
>> samples)
>>
>> It is suggested that one should define the mu and sigma in the command by
>> looking at your dataset. But in my case there are many datasets and it will
>> keep on changing every time. please suggest what can I do to resolve this
>> issue.
>>
>> Regards
>> Anchal
>>
>> On Tuesday, 16 July 2013 17:53:09 UTC-4, Simon Zehnder wrote:
>>>
>>> Hi Tjun Kiat Teo,
>>>
>>> you try to fit a Normal mixture to some data. The Normal mixture is very
>>> delicate when it comes to parameter search: If the variance gets closer and
>>> closer to zero, the log Likelihood becomes larger and larger for any values
>>> of the remaining parameters. Furthermore for the EM algorithm it is known,
>>> that it takes sometimes very long until convergence is reached.
>>>
>>> Try the following:
>>>
>>> Use as starting values for the component parameters:
>>>
>>> start.par <- mean(your.data, na.rm = TRUE) + sd(your.data, na.rm = TRUE) *
>>> runif(K)
>>>
>>> For the weights just use either 1/K or the R cluster function with K
>>> clusters
>>>
>>> Here K is the number of components. Further enlarge the maximum number of
>>> iterations. What you could also try is to randomize start parameters and
>>> run an SEM (Stochastic EM). In my opinion the better method is in this case
>>> a Bayesian method: MCMC.
>>>
>>>
>>> Best
>>>
>>> Simon
>>>
>>>
>>> On Jul 16, 2013, at 10:59 PM, Tjun Kiat Teo <teot... at gmail.com
>>> <javascript:>> wrote:
>>>
>>> > I was trying to use the normixEM in mixtools and I got this error
>>> message.
>>> >
>>> > And I got this error message
>>> >
>>> > One of the variances is going to zero; trying new starting values.
>>> > Error in normalmixEM(as.matrix(temp[[gc]][, -(f + 1)])) : Too many
>>> tries!
>>> >
>>> > Are there any other packages for fitting mixture distributions ?
>>> >
>>> >
>>> > Tjun Kiat Teo
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-h... at r-project.org <javascript:> mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-h... at r-project.org <javascript:> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list