[R] mixture univariate distributions fit

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Mon Jan 3 10:37:36 CET 2022


Hallo Bert

The discussion starts to be more off topic here, as you already pointed. There 
probably is not any package (function) in R designed for easy overlapping peak 
(distribution) fitting. With original data one could use mixtools, with 
density or cummulative density values Ivan's suggestion seems to work 
reasonably.

To your questions
1. No, peak location is not known. If I decided to code the function (package) 
myself I would start with plot and user should select possible location by 
locator.

2. No, but one should restrict the number of components to some reasonable 
value.

3. In particle size measurement it is usually lognormal or normal 
distribution, for which the way suggested by Ivan is workable solution. 
However in the other case I have on mind, the function could be more variable 
(Fraser-Suzuki, Cauchy, Pseudo-Voigt, ...) and I would need to program the 
curves myself. Possible way is to make a plot with the starting values and let 
user to change them until the fit is relatively close to measured values.

So unless somebody could point me to an R package for such peak shape mixture 
evaluation I do not consider further discussion necessary. I first need to do 
my homework if I decided to code such function myself.

Thank you again and best regards.
Petr

> -----Original Message-----
> From: Bert Gunter <bgunter.4567 using gmail.com>
> Sent: Friday, December 31, 2021 6:57 PM
> To: PIKAL Petr <petr.pikal using precheza.cz>
> Cc: Ivan Krylov <krylov.r00t using gmail.com>; r-help mailing list <r-help using r-
> project.org>
> Subject: Re: [R] mixture univariate distributions fit
>
> Petr:
> Please feel free to ignore and not reply if you think the following 
> questions
> are unhelpful.
>
> 1. Do you want to know the location of peaks (local modes) or the
> parameters of the/a mixture distribution? Peaks do not have to be located at
> the modes of the individual components of the mixture.
>
> 2. Do you know the number of components in the mixture? This would
> simplify the problem (a lot, I believe; though those more knowledgeable
> should comment on that).
>
> 3. Do you know that the points on the fitted density you get are obtained as
> a mixture of normals? Or  at least of symmetric distributions? ... or 
> whether
> they are obtained by some sort of
> (algorithmic) density estimation procedure?
>
> Best and New Year's greeting to all,
> Bert
>
>
>
> On Fri, Dec 31, 2021 at 1:49 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
> >
> > Hallo Ivan
> >
> > Thanks. Yes, this approach seems to be viable. I did not consider
> > using dnorm in fitting procedure. But as you pointed
> >
> > > (Some nonlinear least squares problems will be much harder to solve
> > > though.)
> >
> > This simple example is quite easy. The more messy are data and the
> > more distributions are mixed in them the more problematic could be the
> > correct starting values selection. Errors could be quite common.
> >
> > x <- (0:200)/100
> > y1 <- dnorm(x, mean=.3, sd=.1)
> > y2 <- dnorm(x, mean=.7, sd=.2)
> > y3 <- dnorm(x, mean=.5, sd=.1)
> >
> > ymix <- ((y1+2*y2+y3)/max(y1+2*y2+y3))+rnorm(201, sd=.001) plot(x,
> > ymix)
> >
> > With just sd1 and sd2 slightly higher, the fit results to error.
> > > fit <- minpack.lm::nlsLM(
> > +  ymix ~ a1 * dnorm(x, mu1, sd1) + a2 * dnorm(x, mu2, sd2)+
> > +  a3 * dnorm(x, mu3, sd3),
> > +  start = c(a1 = 1, mu1 = .3, sd1=.3, a2 = 2, mu2 = .7, sd2 =.3,
> > +  a3 = 1, mu3 = .5, sd3 = .1),
> > +  lower = rep(0, 9) # help minpack avoid NaNs
> > + )
> > Error in nlsModel(formula, mf, start, wts) :
> >   singular gradient matrix at initial parameter estimates
> >
> > If sd1 and sd2 are set to lower value, the function is no longer
> > singular and arrives with result.
> >
> > Well, it seems that the  only way how to procced is to code such
> > function by myself and take care of suitable starting values.
> >
> > Best regards.
> > Petr
> >
> > > -----Original Message-----
> > > From: Ivan Krylov <krylov.r00t using gmail.com>
> > > Sent: Friday, December 31, 2021 9:26 AM
> > > To: PIKAL Petr <petr.pikal using precheza.cz>
> > > Cc: r-help mailing list <r-help using r-project.org>
> > > Subject: Re: [R] mixture univariate distributions fit
> > >
> > > On Fri, 31 Dec 2021 07:59:11 +0000
> > > PIKAL Petr <petr.pikal using precheza.cz> wrote:
> > >
> > > > x <- (0:100)/100
> > > > y1 <- dnorm((x, mean=.3, sd=.1)
> > > > y2 <- dnorm((x, mean=.7, sd=.1)
> > > > ymix <- ((y1+2*y2)/max(y1+2*y2))
> > >
> > > > My question is if there is some package or function which could
> > > > get those values ***directly from x and ymix values***, which is
> > > > basically what is measured in my case.
> > >
> > > Apologies if I'm missing something, but, this being a peak fitting
> > > problem, shouldn't nls() (or something from the minpack.lm or nlsr
> > > packages) work for you here?
> > >
> > > minpack.lm::nlsLM(
> > >  ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2, sigma2),
> > > start = c(a1 = 1, mu1 = 0, sigma1 = 1, a2 = 1, mu2 = 1, sigma2 = 1),
> > > lower = rep(0, 6) # help minpack avoid NaNs
> > > )
> > > # Nonlinear regression model
> > > #  model: ymix ~ a1 * dnorm(x, mu1, sigma1) + a2 * dnorm(x, mu2,
> > > sigma2) #  data: parent.frame()
> > > #      a1    mu1 sigma1     a2    mu2 sigma2
> > > #  0.1253 0.3000 0.1000 0.2506 0.7000 0.1000 # residual
> > > sum-of-squares: 1.289e-31 # # Number of iterations to convergence:
> > > 23 # Achieved convergence tolerance: 1.49e-08
> > >
> > > (Some nonlinear least squares problems will be much harder to solve
> > > though.)
> > >
> > > --
> > > Best regards,
> > > Ivan
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list