[R] boot.stepAIC fails with computed formula

Stephen O'hagan SOhagan at manchester.ac.uk
Wed Aug 23 12:23:42 CEST 2017


As far as I can tell from the docs, strt & frm1 should be the simplest model, 'y1~1', since we’re doing stepAIC in the forward direction.

I found that using 'y1~.' for frm2 doesn't always work, hence the explicit enumeration of the xvars into a full formula. I had forgotten that you don't need the +1 term.

With real data, if you start from the most complex model and go backwards, stepAIC just over-fits and gives up  [for same reason I set the steps parameter].

Cheers,
SGO

-----Original Message-----
From: Bert Gunter [mailto:bgunter.4567 at gmail.com] 
Sent: 22 August 2017 18:51
To: Stephen O'hagan <SOhagan at manchester.ac.uk>
Cc: r-help at r-project.org
Subject: Re: [R] boot.stepAIC fails with computed formula

SImplify your call to lm using the "." argument instead of manipulating formulas.
> strt <- lm(y1 ~ ., data = dat)

and you do not need to explicitly specify the "1+" on the rhs for lm, so

> frm2<-as.formula(paste(trg," ~  ", paste(xvars,collapse = "+")))
works fine, too.

Anyway, doing this gives (but see end of output)"


bst <- boot.stepAIC(strt,data =
dat,B=50,alpha=0.05,direction='forward',steps=limit,
                  scope=list(lower=frm1,upper=frm2))

> bst

Summary of Bootstrapping the 'stepAIC()' procedure for

Call:
lm(formula = y1 ~ ., data = dat)

Bootstrap samples: 50
Direction: forward
Penalty: 2 * df

Covariates selected
   (%)
x1 100
x2 100
x3 100
x4 100
x5 100
x6 100
x7 100
x8 100

Coefficients Sign
   + (%) - (%)
x3   100     0
x6   100     0
x8   100     0
x1    90    10
x2    86    14
x7    62    38
x4    44    56
x5     0   100

Stat Significance
   (%)
x3 100
x6 100
x8 100
x5  34
x2  20
x1  16
x4  10
x7   4


The stepAIC() for the original data-set gave

Call:
lm(formula = y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = dat)

Coefficients:
(Intercept)           x1           x2           x3           x4           x5
  42.008824     0.012304     0.010925     0.976469    -0.005183    -0.021041
         x6           x7           x8
   2.000649     0.004461     3.007071


Stepwise Model Path
Analysis of Deviance Table

Initial Model:
y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8

Final Model:
y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8


  Step Df Deviance Resid. Df Resid. Dev     AIC
1                        191   16.14592 -485.33


HOWEVER, I do not know why your failed calls failed. In view of the above, it appears to be a bug in the formula interface, so if you do not get a satisfactory answer here (i.e. I am wrong about this), you should contact the maintainer.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 10:08 AM, Steve O'Hagan <SOHagan at manchester.ac.uk> wrote:
> The error is "the model fit failed in 50 bootstrap samples
> Error: non-character argument"
>
> Cheers,
> SOH.
>
>
> On 22/08/2017 17:52, Bert Gunter wrote:
>>
>> Failed?  What was the error message?
>>
>> Cheers,
>>
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming 
>> along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan 
>> <SOhagan at manchester.ac.uk> wrote:
>>>
>>> I'm trying to use boot.stepAIC for feature selection; I need to be 
>>> able to specify the name of the dependent variable programmatically, 
>>> but this appear to fail:
>>>
>>> In R-Studio with MS R Open 3.4:
>>>
>>> library(bootStepAIC)
>>>
>>> #Fake data
>>> n<-200
>>>
>>> x1 <- runif(n, -3, 3)
>>> x2 <- runif(n, -3, 3)
>>> x3 <- runif(n, -3, 3)
>>> x4 <- runif(n, -3, 3)
>>> x5 <- runif(n, -3, 3)
>>> x6 <- runif(n, -3, 3)
>>> x7 <- runif(n, -3, 3)
>>> x8 <- runif(n, -3, 3)
>>> y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)
>>>
>>> dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
>>> #the real data won't have these names...
>>>
>>> cn <- names(dat)
>>> trg <- "y1"
>>> xvars <- cn[cn!=trg]
>>>
>>> frm1<-as.formula(paste(trg,"~1"))
>>> frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))
>>>
>>> strt=lm(y1~1,dat) # boot.stepAIC Works fine
>>>
>>> #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##
>>>
>>> #strt=lm(frm1,dat) ## boot.stepAIC FAILS ##
>>>
>>> limit<-5
>>>
>>>
>>> stp=stepAIC(strt,direction='forward',steps=limit,
>>>              scope=list(lower=frm1,upper=frm2))
>>>
>>> bst <-
>>> boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
>>>                      scope=list(lower=frm1,upper=frm2))
>>>
>>> b1 <- bst$Covariates
>>> ball <- data.frame(b1)
>>> names(ball)=unlist(trg)
>>>
>>> Any ideas?
>>>
>>> Cheers,
>>> SOH
>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>


More information about the R-help mailing list