[R] Does logistic regression require the independence of samples?

Mon Apr 3 20:17:13 CEST 2006

HelponR <suncertain <at> gmail.com> writes:

> 
> Dear list:
> 
> Thanks a lot for help. I have a question and I could not find clear answers
> easily.
> 
> When we do logistic regression for one type of events of interest as a
> proportion of a broader types of events, does the logistic regression assume
> that the number of whole types of events should be independent with the
> number of type of interest?
> 
> For example, if one type of events and the whole type of events are two time
> series of count number, but they vary in a same fashion (both increase or
> decrease with time), can we still use logistic regression to figure out the
> time's effect on proportion? If not, what is right thing to do?
> 

  The answer to the general question in the subject is "no" (logistic
regression will fail if observations are correlated), but I think
in this particular case that it's OK; for a multinomial sample,
the numbers of each type are binomial conditional on the total
number of all types.

I ran a numerical experiment to see if the standard errors
were appropriate for a simple example of this type (if the
correlation were going to screw something up it would be likely
to be the standard errors/confidence intervals rather than 
the point estimates):

dosim <- function() {
  time <- sort(runif(200))
  nevents <- rpois(200,10*time)
  type <- rbinom(200,size=nevents,prob=plogis(10*(time-0.5)))
  evmat <- cbind(type,nevents-type)
  m1 <- glm(evmat~time,family="binomial")
  coef(summary(m1))["time",]
}

set.seed(1001)
r1 <- replicate(1000,dosim())
rownames(r1)
true <- 10
cover <- (r1["Estimate",]<true+1.96*r1["Std. Error",] &
          r1["Estimate",]>true-1.96*r1["Std. Error",])
sum(cover)/1000

the answer came out to 0.941, which seems reasonable ...

  I'm hoping/figuring that someone more knowledgeable will
jump in with corrections if I've said something terribly
wrong ...

  Ben Bolker