[R] Stepwise selection with qAIC and qBIC

Wed Sep 4 11:11:59 CEST 2013

Here is a solution I applied using qAIC and package bbmle so I share it 
for next ones. It is not really automatized as I need to read every 
results of the drop() test an enter manually the less significant 
variable but I guess a function can be created in this goal.

nullQ <- update (null, family = quasibinomial)
fullQ <- update (full, family = quasibinomial)

# backward manual selection #Zuur, 2010 p.227 ; Faraway, 2006 p.149
drop1(fullQ, test= "F")
modelQ1 <- update(fullQ, .~. - #Highest p_value (not significant)#)
drop1(modelQ1, test = "F")
modelQ2 <- update(modelQ1, .~. -#2nd Highest p_value (not significant)#)
drop1(modelQ2, test = "F")
modelQ3 <- update(modelQ2, .~. - # 3rd #)
drop1(modelQ3, test = "F")
modelQ4 <- update(modelQ3, .~. - # 4th #)
drop1(modelQ4, test = "F")
modelQ5 <- update(modelQ4, .~. - # 5th #)
drop1(modelQ5, test = "F")
modelQ6 <- update(modelQ5, .~. - # 6th #)
drop1(modelQ6, test = "F")

library(bbmle)

# overdispersion parameter calculated from the most complex model as the 
sum of squares Pearson residuals divided by the number of degrees of 
freedom # Burnham & Anderson, 2002, p.67

Qi2 <- sum(residuals(fullQ, type= "pearson")^2)
dfr <-  summary(fullQ)$df.residual
disp <- Qi2/dfr

full <- update(fullQ, family = binomial) # we need to retrieve the 
loglikelihood so we use the "binomial model"
model1 <- update(modelQ1, family = binomial)
model2 <- update(modelQ2, family = binomial)
model3 <- update(modelQ3, family = binomial)
model4 <- update(modelQ4, family = binomial)
model5 <- update(modelQ5, family = binomial)
model6 <- update(modelQ6, family = binomial)

qAICtab <- ICtab (full, model1, model2, model3, model4, model5, model6, 
dispersion = disp, type = "qAIC", base = TRUE) # we use the global 
dispersion parameter as recommended in Burnham & Anderson, 2002

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

Le 28/08/2013 17:34, Xochitl CORMON a écrit :
> Dear list,
>
> I am currently working with presence/absence GLM. Therefore I am using
> binomial family and selection my models this way :
>
> null <- glm(respvarPAT ~ 1 , family = binomial, data = datafit)
> full <- glm(respvarPAT ~ CSpp + FSpp + Gpp + Mpp + Ppp + Lpp + TempPoly2
> + DepthPoly2 + DepthPoly3 , family = binomial, data = datafit)
> model1 <- stepAIC(full, scope = list(lower = null, upper = full),
> direction = "backward") #AIC backward
> model2 <- stepAIC(full, scope = list(lower = null, upper = full),
> direction = "backward", k=log(nobs(full))) #BIC backward
> model3 <- stepAIC(null, scope = list(lower = null, upper = full),
> direction = "forward") #AIC forward
> model4 <- stepAIC(null, scope = list(lower = null, upper = full),
> direction = "forward", k=log(nobs(full))) #BIC forward
> model5 <- stepAIC(null, scope = list(lower = null, upper = full),
> direction = "both") #AIC both
> model6 <- stepAIC(null, scope = list(lower = null, upper = full),
> direction = "both", k=log(nobs(full))) #BIC both
>
> Every model generated are actually identical being :
>
> glm(formula = respvarPAT ~ DepthPoly2 + DepthPoly3 + Gpp + Mpp +
> TempPoly2 + Lpp, family = binomial, data = datafit)
>
> This worked pretty good for the last set of explanatory variables I used
> however for these ones I have a bit of overdispersion problem.
> Dispersion parameter for more complex model(full) being : 7.415653.
>
> I looked into literature and found out that I should use qAIC and qBIC
> for selection. Thus my former stepwise selection is biased as using AIC
> and BIC (binomial family).
>
> My question is to know if there is way to change the k parameter in
> stepAIC in order to get quasi criterion. If not is there a way to
> automatize the selection using this criterion and having the dispersion
> parameter, customizing stepAIC function for example? Unfortunately I am
> reaching here my abilities in statistics and programming and cannot
> figure out if what I want to do is doable or not.
>
> Thank you for your help,
>
> Regards,
>
> Xochitl C.
>