[R] Logistic regression model selection with overdispersed/autocorrelated data

Tue Jan 31 17:09:00 CET 2006

Jesse.Whittington at pc.gc.ca wrote:
>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the
radio-collars
> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways â€“ small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using â€œanimalâ€ as a
> cluster with the Design library, however I havenâ€™t found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation
factor
> (Burnham & Anderson).  However, this VIF can vary greatly depending on
the
> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse

Frank E Harrell Jr wrote:

If you must do non-subject-matter-driven model selection, look at the
fastbw function in Design, which will use the cluster bootstrap variance
matrix.

Frank

Thanks for the tip.  I didn't know that the fastbw function could account
for the clustered variance.  For others, the code to run such a model from
the Design library would be:

model.1 <- lrm(y ~ x1+x2+x3+x4, data=data, x=T,y=T)          # create model
model.2 <- bootcov(model.1, cluster=data$animal, B=10000)    # calculate
robust variance matrix
fastbw(model.2)                                              # backward
step-wise selection.

Later we will examine individual caribou responses to trails
(subject-specific model selection).  For this we plan to use mixed effects
models (lmer).  Is this what you would also recommend?

I look forward to reading the new edition of your book when it is
published.

Jesse