# [R] About stepwise regression problem

Frank Harrell f.harrell at vanderbilt.edu
Fri Oct 7 14:53:37 CEST 2011

```Removing variables because of high P-values is not a valid procedure.  Use of
AIC or BIC is just a restatement of P-values.  AIC can be quite useful if
you have posited a very small number of fully pre-specified models (e.g., 2
or 3) and want to choose between them.  Stepwise variable selection without
shrinkage is invalid.
Frank

pigpigmeow wrote:
>
> chris,
> I'm not using lmer, i just use gam mixed with smoothing function and
> linear function
> and summary of the model, it shows
> Family: gaussian
>
> Formula:
> newNO2 ~ pressure + s(maxtemp, bs = "cr") + s(avetemp, bs = "cr") +
>     s(mintemp, bs = "cr") + RH + s(solar, bs = "cr") + s(windspeed,
>     bs = "cr") + s(transport, bs = "cr")
>
> Parametric coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.721513   0.049108  55.419   <2e-16 ***
> pressure    0.028988   0.019434   1.492    0.140
> RH          0.005228   0.009763   0.535    0.594
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Approximate significance of smooth terms:
>                edf Ref.df     F p-value
> s(maxtemp)   6.346  7.276 1.223 0.29991
> s(avetemp)   1.000  1.000 0.226 0.63562
> s(mintemp)   1.908  2.396 1.066 0.35871
> s(solar)     3.797  4.490 2.164 0.07359 .
> s(windspeed) 5.305  6.341 2.346 0.03648 *
> s(transport) 7.234  7.984 2.807 0.00884 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> R-sq.(adj) =  0.307   Deviance explained = 49.1%
> GCV score = 61.136  Scale est. = 44.49     n = 105
>
> In the parametric  coefficients part, I see that Pr(>|t|)     which mean
> the probablity greater than T-value. Is that probablity mean p-value?
> In the Approximate significance of smooth terms part,  p-value column
> shows the probability greater than F-value.
>
> I have the following question,
> 1.if I reject the variable term which has greater the p-value no matter
> the variable term is smoothing term or linear term, is it correct to
> perform stepwise regression.
>  2. In my model
> noxd<-gam(newNOX~pressure+maxtemp+s(avetemp,bs="cr")+s(mintemp,bs="cr")+s(RH,bs="cr")+s(solar,bs="cr")+s(windspeed,bs="cr")+s(transport,bs="cr"),family=gaussian
> 3. what the different if I use other criteria such as AIC or BIC?
>
> Anyway, thank all of you!
>

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/About-stepwise-regression-problem-tp3870217p3882092.html
Sent from the R help mailing list archive at Nabble.com.

```