[R] About stepwise regression problem
Frank Harrell
f.harrell at vanderbilt.edu
Fri Oct 7 14:53:37 CEST 2011
Removing variables because of high P-values is not a valid procedure. Use of
AIC or BIC is just a restatement of P-values. AIC can be quite useful if
you have posited a very small number of fully pre-specified models (e.g., 2
or 3) and want to choose between them. Stepwise variable selection without
shrinkage is invalid.
Frank
pigpigmeow wrote:
>
> chris,
> I'm not using lmer, i just use gam mixed with smoothing function and
> linear function
> and summary of the model, it shows
> Family: gaussian
> Link function: log
>
> Formula:
> newNO2 ~ pressure + s(maxtemp, bs = "cr") + s(avetemp, bs = "cr") +
> s(mintemp, bs = "cr") + RH + s(solar, bs = "cr") + s(windspeed,
> bs = "cr") + s(transport, bs = "cr")
>
> Parametric coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.721513 0.049108 55.419 <2e-16 ***
> pressure 0.028988 0.019434 1.492 0.140
> RH 0.005228 0.009763 0.535 0.594
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Approximate significance of smooth terms:
> edf Ref.df F p-value
> s(maxtemp) 6.346 7.276 1.223 0.29991
> s(avetemp) 1.000 1.000 0.226 0.63562
> s(mintemp) 1.908 2.396 1.066 0.35871
> s(solar) 3.797 4.490 2.164 0.07359 .
> s(windspeed) 5.305 6.341 2.346 0.03648 *
> s(transport) 7.234 7.984 2.807 0.00884 **
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> R-sq.(adj) = 0.307 Deviance explained = 49.1%
> GCV score = 61.136 Scale est. = 44.49 n = 105
>
> In the parametric coefficients part, I see that Pr(>|t|) which mean
> the probablity greater than T-value. Is that probablity mean p-value?
> In the Approximate significance of smooth terms part, p-value column
> shows the probability greater than F-value.
>
> I have the following question,
> 1.if I reject the variable term which has greater the p-value no matter
> the variable term is smoothing term or linear term, is it correct to
> perform stepwise regression.
> 2. In my model
> noxd<-gam(newNOX~pressure+maxtemp+s(avetemp,bs="cr")+s(mintemp,bs="cr")+s(RH,bs="cr")+s(solar,bs="cr")+s(windspeed,bs="cr")+s(transport,bs="cr"),family=gaussian
> (link=log),groupD,methods=REML) , is it generalized additive mixed model?
> 3. what the different if I use other criteria such as AIC or BIC?
>
> Anyway, thank all of you!
>
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/About-stepwise-regression-problem-tp3870217p3882092.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list