# [R] Does logistic regression require the independence of samples?

Ben Bolker bolker at ufl.edu
Mon Apr 3 20:17:13 CEST 2006

```HelponR <suncertain <at> gmail.com> writes:

>
> Dear list:
>
> Thanks a lot for help. I have a question and I could not find clear answers
> easily.
>
> When we do logistic regression for one type of events of interest as a
> proportion of a broader types of events, does the logistic regression assume
> that the number of whole types of events should be independent with the
> number of type of interest?
>
> For example, if one type of events and the whole type of events are two time
> series of count number, but they vary in a same fashion (both increase or
> decrease with time), can we still use logistic regression to figure out the
> time's effect on proportion? If not, what is right thing to do?
>

The answer to the general question in the subject is "no" (logistic
regression will fail if observations are correlated), but I think
in this particular case that it's OK; for a multinomial sample,
the numbers of each type are binomial conditional on the total
number of all types.

I ran a numerical experiment to see if the standard errors
were appropriate for a simple example of this type (if the
correlation were going to screw something up it would be likely
to be the standard errors/confidence intervals rather than
the point estimates):

dosim <- function() {
time <- sort(runif(200))
nevents <- rpois(200,10*time)
type <- rbinom(200,size=nevents,prob=plogis(10*(time-0.5)))
evmat <- cbind(type,nevents-type)
m1 <- glm(evmat~time,family="binomial")
coef(summary(m1))["time",]
}

set.seed(1001)
r1 <- replicate(1000,dosim())
rownames(r1)
true <- 10
cover <- (r1["Estimate",]<true+1.96*r1["Std. Error",] &
r1["Estimate",]>true-1.96*r1["Std. Error",])
sum(cover)/1000

the answer came out to 0.941, which seems reasonable ...

I'm hoping/figuring that someone more knowledgeable will
jump in with corrections if I've said something terribly
wrong ...

Ben Bolker

```