[R] "logistic" + "neg binomial" + ...

Sat Sep 23 17:15:17 CEST 2006

On 22-Sep-06 Ted Harding wrote:
> I've just come across a kind of problem which leads
> me to wonder how to approach it in R.
> 
> Basically, each a set of items is subjected to a series
> of "impacts" until it eventually "fails". The "force"
> of each impact would depend on  covariates X,Y say;
> [...]
>  ... one could envisage
> something like a logistic model for the probabiliy
> of failure at each impact, leading to a kind of
> generalised "geometric distribution" -- that is,
> the likelihood for each item would be of the form
> 
>   (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n]
> 
> where P[i] could have a logistic model in terms of
> the values of X[i] and Y[i], and n is the index of
> the impact at which failure occurred. That is then
> a solvable problem.

I may be getting closer, but am well off target still!

Starting with the case of no covariates, one has

   p*(1-p)^(n-1) (n = 1,2,...) or p*(1-p)^y (y = 0,1,...)

which is a particular case of a negative binomial, with
"target successes" = 1. In terms of the two-stage model
for a negative binomial (see V&R MASS section 7.4), this
corresponds to

   (mu^y * theta^theta)/(mu + theta)^(theta + y)
   *gamma(theta + y)/(gamma(theta)*y!)

with theta = 1 and p = theta/(mu + theta) = 1/(mu + 1).

This was in the context of having landed on glm.nb in MASS.

However, glm.nb fits theta, which I would want to fix at 1.

I don't see anything in ?glm.nb which allows theta to be
held at a fixed value.

The next snag is that it would not be straightforward, as
far as I can see, to introduce covariates. The typical data
set would be a set of sequences each of the form

   X1 Y1 0
   X2 Y2 0
   .......
   Xn Yn 1

where the value of n is random, so varies from sequence to
sequence. In the above negative binomial framework, y=(n-1)
and the covariates for that value of y would be the set

 (X1,X2,...,Xn, Y1,Y2,...,Yn)

and therefore of variable length for each observation (i.e.
sequence as above, or value of y per sequence). I don't
know how one can accomodate a variable length of covariates
per observation.

So it looks as though glm.nb, while thinking along the lines
I want, won't fit the bill!

However, other features of glm.nb would be suitable, since

  p/(1-p) = 1/mu

and a logistic model for p therefore means a linear fit to
log(mu), and glm.nb allows a log link.

Comments welcome!
With thanks,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Sep-06                                       Time: 16:15:14
------------------------------ XFMail ------------------------------