[R-sig-ME] negative binomial distribution mixture model

Fri Mar 7 19:42:51 CET 2014

On 14-03-06 07:50 AM, Fabian Amman wrote:
> Dear mixed-models group
> 
> I'd like to ask for your advice in the following matter:
> 
> I have two data vectors of observed count data: 'A' and 'B', whereas
> count A[n] and B[n] refer to the same observation point.
> 'A' is assumed to follow a negative binomial distribution.
> 'B' is assumed to be the result from two underlying processes. For one
> again an independent negative binomial distribution and additionally a
> kind of shadowing effect from 'A', where a certain fraction 'f' of
> counts from 'A' are also observed in 'B'.
> 
> Accordingly, one can simulate such data by:
> 
>         set.seed(13579)
>         f <- runif(1)
>         A <- rnbinom(100, mu = 100, size = 1)
>         B <- floor(f*A) + rnbinom(100, mu = 20, size = 1)
>         
> Now to my question: Since 'B' is a mixed model of two negative binomial
> distributed variables, how can I estimate the value of factor 'f'
> explaining the underlying data best?
> 
> The final result should be a vector 'B2', correcting 'B' for its shadow
> portion: 'B2 = B - (A/f)'; Since I know 'A' and 'B', to gain this I need
> to estimate 'f' from the data and the underlying assumption of a
> negative binomial distribution.
> 
> As I guess you could figure from my question, I m not really a statistic
> nor an R expert, therefore any help is highly appreciated.
> 
> Thank you very much in advance.
> Regards
> Fabian
> 

  This isn't really appropriate for this list: it's a "mixture" model,
of sorts, rather than a "mixed" model.  ("Mixed" models, at least in the
context of this list, refer to models where some of the predictors are
assumed to be random variables drawn from a multivariate normal
distribution.)  If your model doesn't have observed categorical
variables that are associated with unobserved Gaussian random variables,
then you're not really in the right place.

  I'll make these comments, though.

  (1) If I had to do this I would probably set it up in BUGS
(WinBUGS/JAGS).  However, there may be a more elegant/closed-form way to
do it.

  (2) your simulation model will make more sense as a *statistical*
model if you say

   B <- rbinom(100, size=A, prob=f) + rnbinom(100, mu=20, size=1)

instead of using floor().  Then B is at least a well-defined random
variable (the sum of a binomial sample of a NB variable and a NB
variable).  I don't remember my generating-function tricks very well,
but it's even possible that a binomial sample of a NB variable is still
NB ...

  I would suggest CrossValidated (http://stats.stackexchange.com)

 good luck,
    Ben Bolker