[R] "logistic" + "neg binomial" + ...
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Sat Sep 23 17:15:17 CEST 2006
On 22-Sep-06 Ted Harding wrote:
> I've just come across a kind of problem which leads
> me to wonder how to approach it in R.
>
> Basically, each a set of items is subjected to a series
> of "impacts" until it eventually "fails". The "force"
> of each impact would depend on covariates X,Y say;
> [...]
> ... one could envisage
> something like a logistic model for the probabiliy
> of failure at each impact, leading to a kind of
> generalised "geometric distribution" -- that is,
> the likelihood for each item would be of the form
>
> (1-P[1])*(1-P[2])*...*(1-P[n-1])*P[n]
>
> where P[i] could have a logistic model in terms of
> the values of X[i] and Y[i], and n is the index of
> the impact at which failure occurred. That is then
> a solvable problem.
I may be getting closer, but am well off target still!
Starting with the case of no covariates, one has
p*(1-p)^(n-1) (n = 1,2,...) or p*(1-p)^y (y = 0,1,...)
which is a particular case of a negative binomial, with
"target successes" = 1. In terms of the two-stage model
for a negative binomial (see V&R MASS section 7.4), this
corresponds to
(mu^y * theta^theta)/(mu + theta)^(theta + y)
*gamma(theta + y)/(gamma(theta)*y!)
with theta = 1 and p = theta/(mu + theta) = 1/(mu + 1).
This was in the context of having landed on glm.nb in MASS.
However, glm.nb fits theta, which I would want to fix at 1.
I don't see anything in ?glm.nb which allows theta to be
held at a fixed value.
The next snag is that it would not be straightforward, as
far as I can see, to introduce covariates. The typical data
set would be a set of sequences each of the form
X1 Y1 0
X2 Y2 0
.......
Xn Yn 1
where the value of n is random, so varies from sequence to
sequence. In the above negative binomial framework, y=(n-1)
and the covariates for that value of y would be the set
(X1,X2,...,Xn, Y1,Y2,...,Yn)
and therefore of variable length for each observation (i.e.
sequence as above, or value of y per sequence). I don't
know how one can accomodate a variable length of covariates
per observation.
So it looks as though glm.nb, while thinking along the lines
I want, won't fit the bill!
However, other features of glm.nb would be suitable, since
p/(1-p) = 1/mu
and a logistic model for p therefore means a linear fit to
log(mu), and glm.nb allows a log link.
Comments welcome!
With thanks,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Sep-06 Time: 16:15:14
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list