[R] help on model selection - step()

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 11 09:18:23 CEST 2008


On Mon, 11 Aug 2008, Rodrigo Gazaffi wrote:

> dears R-users,
> I'm interested in model selection problem, and i have faced some problems
> that i would like to ask for help.
>
> well,
> this is a very small example with 4 variable (just one var. is the response
> - z) with 100 individuals
> i would like to do a stepwise search, for the "best" model, and a use BIC
> criteria.
>
> I know when I have a lot of variables, let's say 120, I know, it's not wise,
> consider the full model, so starting from "y~1", i can stop the search with
> the option steps.
> but when i have the IC with a negative value, is there any way that a can
> stop the search?

Not in the existing function.  The absolute size of AIC (or BIC) has no 
meaning for a linear model fit (since this is true of the log-likelihood 
-- it depends on the scale of measurement).

But R is Open Source so you can modify step() in any way you like, even to 
do nonsensical things.

> for example: form this data set
> the first step gives AIC=3.6, and the 2nd gives -9.03, IS THERE ANY WAY that
> a could say, "stop here, the previous one is the best for me"... like here,
> my model would be with no variable.
> I know that example, looks like silly but a have bigger data, that this
> happens in thirtieth iteration, what's why i would like some help
>
> i used the step(),  is there other function that could stop this besides
> step()?
>
> cheers,
> Rodrigo Gazaffi
>
>
>
> x1 <- c( 0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,
> 0.3718, -1.0000,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,
> 0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.0713,
> 0.1774,  0.3570,  0.3718,  0.3718,  0.3718, -1.0000,  0.3718, -1.0000,
> 0.1774,  0.3718,  0.3718,  0.0709,  0.1774, -1.0000, -1.0000,  0.3718,
> 0.3718,  0.0713,  0.0709,  0.3718,  0.3718,  0.3718,  0.3718,  0.2614,
> 0.2614, -0.9995, -1.0000,  0.1774,  0.3718, -1.0000, -1.0000,  0.1774,
> 0.3718,  0.1774,  0.3718,  0.3718, -1.0000,  0.3718,  0.3718,  0.3718,
> 0.3718,  0.3718, -1.0000,  0.3718,  0.3718,  0.3718,  0.3718,  0.0709,
> 0.0710,  0.3718,  0.3718,  0.3718,  0.3718,  0.3718,  0.0709,  0.3718,
> 0.0709,  0.0709,  0.3718,  0.0709,  0.3570,  0.3718,  0.3718,  0.3718,
> 0.0709,  0.3718,  0.3718,  0.3718, -1.0000,  0.3718,  0.3718,  0.3718,
> -1.0000,  0.3718,  0.3718,  0.3718,  0.3718)
>
> x2 <- c( 0.3898, -0.9995,  0.3898,  0.3898,  0.3898,  0.1978,  0.3898,
> -0.9997, -1.0000, -1.0000,  0.3898,  0.3898,  0.3898,  0.3898, -1.0000,
> 0.1978, -1.0000,  0.3898,  0.3898, -1.0000,  0.1978,  0.3898,  0.3898,
> 0.3898,  0.1978, -0.9995,  0.3792, -1.0000, -1.0000,  0.3898,  0.0837,
> 0.0837,  0.0837,  0.3898,  0.0837,  0.3898,  0.3898,  0.0837,  0.3898,
> 0.0837,  0.0837, -1.0000, -1.0000,  0.3898,  0.0841,  0.1976, -1.0000,
> 0.2467,  0.1978,  0.3842,  0.3898,  0.3848,  0.2766,  0.3898,  0.3898,
> 0.3898, -1.0000, -0.9995,  0.3898,  0.3898,  0.0837,  0.3898, -1.0000,
> 0.1978,  0.3898,  0.2766,  0.3898,  0.3898,  0.3898,  0.2766,  0.3898,
> 0.3866,  0.1978,  0.3898, -1.0000, -1.0000,  0.3898,  0.3898,  0.3898,
> 0.3898,  0.3898,  0.1978,  0.0841, -1.0000,  0.0837,  0.3898,  0.3898,
> -1.0000,  0.3898,  0.3898, -1.0000,  0.3898,  0.3898,  0.0837,  0.3898,
> 0.3898,  0.1976,  0.3898,  0.3898,  0.3898)
>
> x3 <- c( 0.9999,  0.9999,  0.9999,  1.0000, -0.9999,  0.9999, -0.9999,
> 0.9999, -0.9999, -1.0000, -1.0000, -0.9999, -0.9980, -0.9999, -0.9999,
> -1.0000, -0.9999, -0.9999, -0.9999,  1.0000, -1.0000,  1.0000, -1.0000,
> -1.0000, -1.0000, -0.9980,  1.0000, -0.9999, -1.0000, -1.0000, -0.9999,
> -0.9999,  0.9999,  1.0000, -0.9999, -1.0000,  1.0000,  0.9999,  1.0000,
> -0.9999,  0.9999, -1.0000, -1.0000, -0.9999,  0.8356,  0.8356, -0.3241,
> 0.8356,  0.8353,  0.8356,  1.0000, -1.0000, -1.0000, -1.0000, -1.0000,
> -1.0000, -0.9999,  0.9999,  1.0000, -0.9980,  0.9999,  1.0000, -1.0000,
> 1.0000, -0.9999,  1.0000,  0.9999, -1.0000,  1.0000, -1.0000,  0.9999,
> 0.9999, -1.0000, -1.0000,  1.0000, -1.0000, -1.0000,  1.0000,  1.0000,
> 1.0000, -0.9999,  1.0000, -1.0000,  1.0000, -1.0000,  1.0000, -1.0000,
> 1.0000,  1.0000,  1.0000, -1.0000, -0.9999, -0.8547, -1.0000, -0.7851,
> 0.8356, -1.0000, -0.9999, -0.9999,  1.0000)
>
> z  <- c( -0.006548414, -1.035584950, -0.006548414,  0.180549138,
> 0.741841793,  1.770878329, -0.848487398, -1.035584950, -2.251719037,
> 0.461195465,  2.051524656,  1.116036897, -0.193645966,  0.274097913,
> 0.180549138,  0.274097913,  0.274097913,  0.835390569,  0.928939345,
> -1.316231277,  0.087000362,  0.741841793,  1.116036897,  0.180549138,
> -0.193645966,  0.274097913,  0.274097913,  1.490232001, -1.222682502,
> 1.303134449,  0.367646689, -0.100097190, -0.006548414, -1.035584950,
> 1.490232001,  0.648293017, -2.064621485, -2.625914141,  1.022488121,
> -0.006548414, -1.222682502, -0.567841070, -0.942036174,  0.461195465,
> 1.770878329,  0.461195465, -1.503328829, -1.035584950, -0.848487398,
> -0.567841070,  1.396683225,  2.051524656, -0.942036174, -0.754938622,
> -1.596877605,  0.648293017, -0.287194742, -0.567841070,  0.461195465,
> -0.474292294, -0.100097190, 0.287194742,  0.554744241, -0.006548414,
> 1.209585673, -1.409780053,  0.928939345,  0.928939345, -0.006548414,
> 1.396683225, -0.380743518,  0.928939345,  1.490232001,  1.770878329,
> -1.129133726, -0.848487398, -0.380743518,  0.274097913, -1.409780053,
> -0.100097190,  0.367646689, -0.474292294,  0.554744241, -2.251719037,
> 0.087000362, -0.848487398,  0.741841793, -2.064621485, -0.006548414,
> 0.461195465, -0.100097190, -0.006548414,  0.648293017, -0.287194742,
> 0.928939345, -0.193645966, -0.474292294, -0.006548414, -1.035584950,
> 0.461195465)
>
> step(lm(z
> ~1),scope=list(lower=~1,upper=~x1+x2+x3),direction="both",k=log(length(z)))
> #########
> Start:  AIC=3.6
> z ~ 1
>
>       Df Sum of Sq    RSS    AIC
> + x1    1    15.671 83.329 -9.028
> + x2    1    12.390 86.610 -5.165
> + x3    1     7.403 91.597  0.433
> <none>              99.000  3.600
>
> Step:  AIC=-9.03
> z ~ x1
>
>       Df Sum of Sq     RSS     AIC
> + x2    1    13.675  69.654 -22.348
> + x3    1     7.078  76.251 -13.299
> <none>               83.329  -9.028
> - x1    1    15.671  99.000   3.600
>
> Step:  AIC=-22.35
> z ~ x1 + x2
>
>       Df Sum of Sq     RSS     AIC
> + x3    1     8.930  60.723 -31.463
> <none>               69.654 -22.348
> - x2    1    13.675  83.329  -9.028
> - x1    1    16.956  86.610  -5.165
>
> Step:  AIC=-31.46
> z ~ x1 + x2 + x3
>
>       Df Sum of Sq     RSS     AIC
> <none>               60.723 -31.463
> - x3    1     8.930  69.654 -22.348
> - x2    1    15.527  76.251 -13.299
> - x1    1    16.669  77.392 -11.813
>
> Call:
> lm(formula = z ~ x1 + x2 + x3)
>
> Coefficients:
> (Intercept)           x1           x2           x3
>    -0.2015       0.9000       0.7269      -0.3083
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list