[R] Does logistic regression require the independence of samples?

Ben Bolker bolker at ufl.edu
Mon Apr 3 20:17:13 CEST 2006

HelponR <suncertain <at> gmail.com> writes:

> Dear list:
> Thanks a lot for help. I have a question and I could not find clear answers
> easily.
> When we do logistic regression for one type of events of interest as a
> proportion of a broader types of events, does the logistic regression assume
> that the number of whole types of events should be independent with the
> number of type of interest?
> For example, if one type of events and the whole type of events are two time
> series of count number, but they vary in a same fashion (both increase or
> decrease with time), can we still use logistic regression to figure out the
> time's effect on proportion? If not, what is right thing to do?

  The answer to the general question in the subject is "no" (logistic
regression will fail if observations are correlated), but I think
in this particular case that it's OK; for a multinomial sample,
the numbers of each type are binomial conditional on the total
number of all types.

I ran a numerical experiment to see if the standard errors
were appropriate for a simple example of this type (if the
correlation were going to screw something up it would be likely
to be the standard errors/confidence intervals rather than 
the point estimates):

dosim <- function() {
  time <- sort(runif(200))
  nevents <- rpois(200,10*time)
  type <- rbinom(200,size=nevents,prob=plogis(10*(time-0.5)))
  evmat <- cbind(type,nevents-type)
  m1 <- glm(evmat~time,family="binomial")

r1 <- replicate(1000,dosim())
true <- 10
cover <- (r1["Estimate",]<true+1.96*r1["Std. Error",] &
          r1["Estimate",]>true-1.96*r1["Std. Error",])

the answer came out to 0.941, which seems reasonable ...

  I'm hoping/figuring that someone more knowledgeable will
jump in with corrections if I've said something terribly
wrong ...

  Ben Bolker

More information about the R-help mailing list