[R] separate mixture of gamma and normal (with mixtools or ??)
David Winsemius
dwinsemius at comcast.net
Thu Jan 26 21:26:57 CET 2017
> On Jan 23, 2017, at 11:47 PM, Thomas Petzoldt <thpe at simecol.de> wrote:
>
> Dear Bert,
>
> thank you very much for your suggestion. You are right, ill-conditioning was sometimes a problem for 3 components, but not in the two-component case. The modes are well separated, and the sample size is high.
>
> My main problem is (1) the shape of the distributions and (2) the diversity of available packages and approaches to this topic.
>
> In the mean time I made some progress in this matter by treating the data as a mixture of gamma distributions (package mixdist, see below), so what I want to know is the purely R technical question ;-)
>
> Has someone else has ever stumbled across something like this and can make a suggestion which package to use?
In survival analysis of cancer cases, the question of cure comes up often. Physicians sometimes have a naive notion of survival to 5 years after definitive treatment with no evident recurrence being equivalent to 'cure', despite the fact that there is great heterogeneity in the recurrence and survival distribution of different cancer types. I have see papers in the medical statistical literature that used mixtures of Weibull variates to model this problem. The cancer-specific survival is often exponential (mu=1) or "sub-exponential" (shape < 1) whereas non-cancer survival times are "super-exponential" (shape >> 1). When I ran your second simulation with dist='weibull' I get:
> library("mixdist")
> set.seed(123)
> lambda <- c(0.25, 0.75)
> N <- 2000
> x <- c(rexp(lambda[1]*N, 1), rnorm(lambda[2]*N, 20, 4))
> xx <- mixgroup(x, breaks=0:40)
> pp <- mixparam(mu=c(.5, 8), sigma=c(1, 10), pi=c(0.2, 0.5))
> mix <- mix(xx, pp, dist="weibull", emsteps=10)
>
> summary(mix)
Parameters:
pi mu sigma
1 0.2492 1.016 0.8702
2 0.7508 20.079 4.2328
Standard Errors:
pi.se mu.se sigma.se
1 0.009683 0.04249 0.05137
2 0.009683 0.10972 0.06802
Analysis of Variance Table
Df Chisq Pr(>Chisq)
Residuals 29 80.407 1.01e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> p <- coef(mix)
> p
pi mu sigma
1 0.2492477 1.016319 0.8702055
2 0.7507523 20.079093 4.2327769
So the exponential parameter at least is well-estimated. I believe that Weibull variates with shape >> 1 are approximately normal, but I know that your mathematical sophistication exceeds mine by quite a bit, so consider this only a dilettante's comment.
--
David.
>
> Thanks, Thomas
>
>
> ## Approximate an Exponential+Gaussian mixture with
> ## a mixture of Gammas
>
> library("mixdist")
> set.seed(123)
> lambda <- c(0.25, 0.75)
> N <- 2000
> x <- c(rexp(lambda[1]*N, 1), rnorm(lambda[2]*N, 20, 4))
>
> xx <- mixgroup(x, breaks=0:40)
> pp <- mixparam(mu=c(1, 8), sigma=c(1, 3), pi=c(0.2, 0.5))
> mix <- mix(xx, pp, dist="gamma", emsteps=10)
>
> summary(mix)
> p <- coef(mix)
> beta <- with(p, sigma^2/mu)
> alpha <- with(p, mu /beta)
> lambda <- p$pi
>
> plot(mix, xlim=c(0, 35))
> x1 <- seq(0, 35, 0.1)
> lines(x1, lambda[1]*dgamma(x1, alpha[1], 1/beta[1]),
> col="orange", lwd=2)
> lines(x1, lambda[2]*dgamma(x1, alpha[2], 1/beta[2]),
> col="magenta", lwd=2)
>
>
>
> Am 24.01.2017 um 00:34 schrieb Bert Gunter:
>> Fitting multicomponent mixtures distributions -- and 3 is already a
>> lot of components -- is inherently ill-conditioned. You may need to
>> reassess your strategy. You might wish to post on stackexchange
>> instead to discuss such statistical issues.
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Mon, Jan 23, 2017 at 2:32 PM, Thomas Petzoldt <thpe at simecol.de> wrote:
>>> Dear friends,
>>>
>>> I am trying to separate bi- (and sometimes tri-) modal univariate mixtures
>>> of biological data, where the first component is left bounded (e.g.
>>> exponential or gamma) and the other(s) approximately Gaussian.
>>>
>>> After checking several packages, I'm not really clear what to do. Here is an
>>> example with "mixtools" that already works quite good, however, the left
>>> component is not Gaussian (and not symmetric).
>>>
>>> Any idea about a more adequate function or package for this problem?
>>>
>>> Thanks a lot!
>>>
>>> Thomas
>>>
>>>
>>>
>>> library(mixtools)
>>> set.seed(123)
>>>
>>> lambda <- c(0.25, 0.75)
>>> N <- 200
>>>
>>> ## dist1 ~ gamma (or exponential as a special case)
>>> #dist1 <- rexp(lambda[1]*N, 1)
>>> dist1 <- rgamma(lambda[1]*N, 1, 1)
>>>
>>> ## dist2 ~ normal
>>> dist2 <- rnorm(lambda[2]*N, 12, 2)
>>>
>>> ## mixture
>>> x <- c(dist1, dist2)
>>>
>>> mix <- spEMsymloc(x, mu0=2, eps=1e-3, verbose=TRUE)
>>> plot(mix, xlim=c(0, 25))
>>> summary(mix)
>>>
>>>
>>> --
>>> Thomas Petzoldt
>>> TU Dresden, Institute of Hydrobiology
>>> http://www.tu-dresden.de/Members/thomas.petzoldt
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list