[R] Three-component Negative Binomial Mixture: R code

Thu Nov 10 21:22:23 CET 2016

On Thu, 10 Nov 2016, danilo.carita at uniparthenope.it wrote:

> Thank you for your hints, now the goodness of fit test provides me good 
> results, but surprisingly for me the three-component model turns out to be 
> worse than the two-component one (indeed, I focused on the three-component 
> mixture because the two-component one exhibits a low p-value).
>
> In addition, I have noticed that for some data the function fails to find 
> good starting values, as you have mentioned in your previuous answer. The 
> problem is that the driver FLXMRnegbin() allows to specify only the theta 
> parameter (and only one value, even in the event of mixtures of two or more 
> components).
>
> I have read the description of flexmix() function too, but it seems that it 
> does not allow to set starting values for the parameters of the model. Am I 
> right? Or is there a way to do it?

No, I don't think so. You could look at the FLXMRnegbin() driver code and 
tweak this directly to use better starting values etc. But I think that 
the main issue is that the "theta" parameter often diverges to infinity 
(i.e., a Poisson distribution) if theta is too large in (at least) one of 
the components.

Given that the negbin distribution is a continuous mixture of Poisson 
distributions, I'm not sure whether approaching such data with a finite 
mixture of such continuous mixtures.

What to do with this situation, certainly depends on the data you have and 
the questions you have about it. And I concur with Bert's advice of 
contacting a local statistics expert for discussion of such issues.

hth,
Z

>
>
>
> Achim Zeileis <Achim.Zeileis at uibk.ac.at> ha scritto:
>
>> On Tue, 8 Nov 2016, danilo.carita at uniparthenope.it wrote:
>> 
>>> I tried the function flexmix() with the driver FLXMRnegbin() with two 
>>> components first, in order to compare its results with those provided by 
>>> my function mixnbinom(). In particular, I ran the following code:
>>> 
>>> 
>>>> fm0 <- flexmix(y ~ 1, data = data.frame(y), k = 2, model = FLXMRnegbin())
>>> 
>>> 
>>> where "y" is my vector of counts. The previous function provided me the 
>>> following parameters:
>>> 
>>>
>>>>                 Comp.1   Comp.2
>>>> coef.(Intercept) 1.2746536 1.788578
>>>> theta            0.1418201 5.028766
>>> 
>>> 
>>> with priors 0.342874 and 0.657126, respectively. I assume that the 
>>> coefficients "Intercept" represent the two means of the model (mu1 and 
>>> mu2),
>> 
>> No, a log link is employed, i.e., exp(1.2746536) and exp(1.788578) are the 
>> means.
>> 
>>> while the "theta" coefficients are the size parameters (size1 and size2).
>> 
>> Yes.
>> 
>>> Unfortunately, unlike my function mixnbinom(), the model computed with 
>>> flexmix() did not provide a good fit to my data (p-value ~0).
>>> 
>>> Is there something wrong in the process above?
>> 
>> Hard to say without a reproducible example. Using parameter values similar 
>> to the ones you cite above, the following seems to do a reasonable job:
>> 
>> ## packages
>> library("countreg")
>> library("flexmix")
>> 
>> ## artificial data from two NB distributions:
>> ## 1/3 is NB(mu = 3.5, theta = 0.2) and
>> ## 2/3 is NB(mu = 6.0, theta = 5.0)
>> set.seed(1)
>> y <- c(rnbinom(200, mu = 3.5, size = 0.2), rnbinom(400, mu = 6, size = 5))
>> 
>> ## fit 2-component mixture model
>> set.seed(1)
>> fm <- flexmix(y ~ 1, k = 2, model = FLXMRnegbin())
>> 
>> ## inspect estimated parameters -> look acceptable
>> parameters(fm)
>> exp(parameters(fm)[1,])
>> 
>> My experience was that finding good starting values may be a problem for 
>> flexmix(). So maybe setting these in some better way would be beneficial.
>
>
>
> -------------------------------------------------------------
> Danilo Carità
>
> PhD Candidate
> University of Naples "Parthenope"
> Dipartimento di Studi Aziendali e Quantitativi
> via G. Parisi, 13, 80132 Napoli - Italy
> -------------------------------------------------------------
>