[R] glm and stepAIC selects too many effects

Marc Girondot marc_grt at yahoo.fr
Tue Jun 6 08:08:43 CEST 2017


This is a question at the border between stats and r.

When I do a glm with many potential effects, and select a model using 
stepAIC, many independent variables are selected even if there are no 
relationship between dependent variable and the effects (all are random 
numbers).

Do someone has a solution to prevent this effect ? Is it related to 
Bonferoni correction ?

Is there is a ratio of independent vs number of observations that is 
safe for stepAIC ?

Thanks

Marc

Example of code. When 2 independent variables are included, no effect is 
selected, when 11 are included, 7 to 8 are selected.

x <- rnorm(15, 15, 2)
A <- rnorm(15, 20, 5)
B <- rnorm(15, 20, 5)
C <- rnorm(15, 20, 5)
D <- rnorm(15, 20, 5)
E <- rnorm(15, 20, 5)
F <- rnorm(15, 20, 5)
G <- rnorm(15, 20, 5)
H <- rnorm(15, 20, 5)
I <- rnorm(15, 20, 5)
J <- rnorm(15, 20, 5)
K <- rnorm(15, 20, 5)

df <- data.frame(x=x, A=A, B=B, C=C, D=D,
                  E=E, F=F, G=G, H=H, I=I, J=J,
                  K=K)

G1 <- glm(formula = x ~ A + B,
          data=df, family = gaussian(link = "identity"))

g1 <- stepAIC(G1)

summary(g1)

G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
          data=df, family = gaussian(link = "identity"))

g2 <- stepAIC(G2)

summary(g2)



More information about the R-help mailing list