[R] [FORGED] glm and stepAIC selects too many effects

Rolf Turner r.turner at auckland.ac.nz
Tue Jun 6 08:25:38 CEST 2017


On 06/06/17 18:08, Marc Girondot via R-help wrote:
> This is a question at the border between stats and r.
> 
> When I do a glm with many potential effects, and select a model using 
> stepAIC, many independent variables are selected even if there are no 
> relationship between dependent variable and the effects (all are random 
> numbers).
> 
> Do someone has a solution to prevent this effect ? Is it related to 
> Bonferoni correction ?
> 
> Is there is a ratio of independent vs number of observations that is 
> safe for stepAIC ?
> 
> Thanks
> 
> Marc
> 
> Example of code. When 2 independent variables are included, no effect is 
> selected, when 11 are included, 7 to 8 are selected.
> 
> x <- rnorm(15, 15, 2)
> A <- rnorm(15, 20, 5)
> B <- rnorm(15, 20, 5)
> C <- rnorm(15, 20, 5)
> D <- rnorm(15, 20, 5)
> E <- rnorm(15, 20, 5)
> F <- rnorm(15, 20, 5)
> G <- rnorm(15, 20, 5)
> H <- rnorm(15, 20, 5)
> I <- rnorm(15, 20, 5)
> J <- rnorm(15, 20, 5)
> K <- rnorm(15, 20, 5)
> 
> df <- data.frame(x=x, A=A, B=B, C=C, D=D,
>                   E=E, F=F, G=G, H=H, I=I, J=J,
>                   K=K)
> 
> G1 <- glm(formula = x ~ A + B,
>           data=df, family = gaussian(link = "identity"))
> 
> g1 <- stepAIC(G1)
> 
> summary(g1)
> 
> G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
>           data=df, family = gaussian(link = "identity"))
> 
> g2 <- stepAIC(G2)
> 
> summary(g2)

IMHO there's nothing much that you can do about this.  Trying to get the 
data to select a model is always fraught with peril.

The phenomenon that you have observed has been remarked on before; see
Alan Miller's book "Subset Selection in Regression" (Chapman and Hall, 
1990), page 12 (first paragraph of section 1.4).

However you might find some of Miller's recommendations to be at least a 
*bit* useful.

cheers,

Rolf Turner

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list