[R] boot.stepAIC fails with computed formula
Stephen O'hagan
SOhagan at manchester.ac.uk
Wed Aug 23 12:23:42 CEST 2017
As far as I can tell from the docs, strt & frm1 should be the simplest model, 'y1~1', since we’re doing stepAIC in the forward direction.
I found that using 'y1~.' for frm2 doesn't always work, hence the explicit enumeration of the xvars into a full formula. I had forgotten that you don't need the +1 term.
With real data, if you start from the most complex model and go backwards, stepAIC just over-fits and gives up [for same reason I set the steps parameter].
Cheers,
SGO
-----Original Message-----
From: Bert Gunter [mailto:bgunter.4567 at gmail.com]
Sent: 22 August 2017 18:51
To: Stephen O'hagan <SOhagan at manchester.ac.uk>
Cc: r-help at r-project.org
Subject: Re: [R] boot.stepAIC fails with computed formula
SImplify your call to lm using the "." argument instead of manipulating formulas.
> strt <- lm(y1 ~ ., data = dat)
and you do not need to explicitly specify the "1+" on the rhs for lm, so
> frm2<-as.formula(paste(trg," ~ ", paste(xvars,collapse = "+")))
works fine, too.
Anyway, doing this gives (but see end of output)"
bst <- boot.stepAIC(strt,data =
dat,B=50,alpha=0.05,direction='forward',steps=limit,
scope=list(lower=frm1,upper=frm2))
> bst
Summary of Bootstrapping the 'stepAIC()' procedure for
Call:
lm(formula = y1 ~ ., data = dat)
Bootstrap samples: 50
Direction: forward
Penalty: 2 * df
Covariates selected
(%)
x1 100
x2 100
x3 100
x4 100
x5 100
x6 100
x7 100
x8 100
Coefficients Sign
+ (%) - (%)
x3 100 0
x6 100 0
x8 100 0
x1 90 10
x2 86 14
x7 62 38
x4 44 56
x5 0 100
Stat Significance
(%)
x3 100
x6 100
x8 100
x5 34
x2 20
x1 16
x4 10
x7 4
The stepAIC() for the original data-set gave
Call:
lm(formula = y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = dat)
Coefficients:
(Intercept) x1 x2 x3 x4 x5
42.008824 0.012304 0.010925 0.976469 -0.005183 -0.021041
x6 x7 x8
2.000649 0.004461 3.007071
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8
Final Model:
y1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8
Step Df Deviance Resid. Df Resid. Dev AIC
1 191 16.14592 -485.33
HOWEVER, I do not know why your failed calls failed. In view of the above, it appears to be a bug in the formula interface, so if you do not get a satisfactory answer here (i.e. I am wrong about this), you should contact the maintainer.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Aug 22, 2017 at 10:08 AM, Steve O'Hagan <SOHagan at manchester.ac.uk> wrote:
> The error is "the model fit failed in 50 bootstrap samples
> Error: non-character argument"
>
> Cheers,
> SOH.
>
>
> On 22/08/2017 17:52, Bert Gunter wrote:
>>
>> Failed? What was the error message?
>>
>> Cheers,
>>
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
>> along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan
>> <SOhagan at manchester.ac.uk> wrote:
>>>
>>> I'm trying to use boot.stepAIC for feature selection; I need to be
>>> able to specify the name of the dependent variable programmatically,
>>> but this appear to fail:
>>>
>>> In R-Studio with MS R Open 3.4:
>>>
>>> library(bootStepAIC)
>>>
>>> #Fake data
>>> n<-200
>>>
>>> x1 <- runif(n, -3, 3)
>>> x2 <- runif(n, -3, 3)
>>> x3 <- runif(n, -3, 3)
>>> x4 <- runif(n, -3, 3)
>>> x5 <- runif(n, -3, 3)
>>> x6 <- runif(n, -3, 3)
>>> x7 <- runif(n, -3, 3)
>>> x8 <- runif(n, -3, 3)
>>> y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)
>>>
>>> dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
>>> #the real data won't have these names...
>>>
>>> cn <- names(dat)
>>> trg <- "y1"
>>> xvars <- cn[cn!=trg]
>>>
>>> frm1<-as.formula(paste(trg,"~1"))
>>> frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))
>>>
>>> strt=lm(y1~1,dat) # boot.stepAIC Works fine
>>>
>>> #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##
>>>
>>> #strt=lm(frm1,dat) ## boot.stepAIC FAILS ##
>>>
>>> limit<-5
>>>
>>>
>>> stp=stepAIC(strt,direction='forward',steps=limit,
>>> scope=list(lower=frm1,upper=frm2))
>>>
>>> bst <-
>>> boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
>>> scope=list(lower=frm1,upper=frm2))
>>>
>>> b1 <- bst$Covariates
>>> ball <- data.frame(b1)
>>> names(ball)=unlist(trg)
>>>
>>> Any ideas?
>>>
>>> Cheers,
>>> SOH
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
More information about the R-help
mailing list