[R] Error of Stepwise Regression with number of rows in use has changed: remove missing values?

Kum-Hoe Hwang phdhwang at gmail.com
Mon Feb 22 09:22:05 CET 2010


This solution such as " data<-na.omit(original database) before you
run step() or stepAIC()" has some limitations, I think. I reduced the
number of data lines, and it enhance R square value.

If you have some tips or advices for another solution, I welcome.

Kum

Urban and Regional Planning, GRI


On Sat, Feb 20, 2010 at 5:57 AM, Greg Snow <Greg.Snow at imail.org> wrote:
> Have you considered the implications of that solution?
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Kum-Hoe Hwang
>> Sent: Wednesday, February 17, 2010 1:41 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] Error of Stepwise Regression with number of rows in
>> use has changed: remove missing values?
>>
>> I thank those who helped to solve a error in stepwise regression with
>> missing values.
>>
>>
>> Kum
>>
>> *
>> *
>>
>> A good solution that I have tried was Andreas's advice.
>>
>> =====================================================================
>>
>> Try
>>
>> data<-na.omit(original database) before you run step() or stepAIC()
>>
>> On Tue, Feb 16, 2010 at 8:09 PM, Peter Ehlers <ehlers at ucalgary.ca>
>> wrote:
>>
>> > On 2010-02-16 1:24, Kum-Hoe Hwang wrote:
>> >
>> >> Howdy, R Grues
>> >>
>> >> I have enjoyed R, but I cannot solve one problem easily. Please help
>> my
>> >> problem.
>> >> When I tried the R script, I got the following Error. This error
>> >> results from input data file exported through a Excel spreadsheet
>> >> software.
>> >>
>> >>  Error in step(lm(pop.rate ~ as.numeric(year) + as.factor(policy) +
>> >> as.numeric(nation.grant) +  :
>> >>   number of rows in use has changed: remove missing values?
>> >>
>> >> Could you direct me to solve the Error?
>> >> Thanks in advance,
>> >>
>> >
>> > This is a common situation when you use step() on data where
>> > the predictors have missing values.
>> >
>> > A case (row) is included in the model only if all the
>> > predictors for that model are non-missing for the case.
>> >
>> > As you vary which predictors are to be in the model, the
>> > included cases will vary, resulting in models based on
>> > different data. (Think of your cases as subjects; you want
>> > all your models to be based on the same set of subjects.)
>> >
>> > Finally: (Re-)read the help page and note the 'warning'.
>> >
>> >  -Peter Ehlers
>> >
>> >
>> >
>> >>
>> >>  ############### outputs from R console ###############
>> >>> pop<- step(
>> >>>
>> >> +             lm(pop.rate ~ as.numeric(year) + as.factor(policy) +
>> >> as.numeric(nation.grant)
>> >> +                + as.numeric(do.grant) + as.numeric(city.grant) +
>> >> as.numeric(DMZ.dist) + as.numeric(Seoul.dist), data=borderI.data,
>> >> na.action = na.omit)
>> >> +             )
>> >> Start:  AIC=494.27
>> >> pop.rate ~ as.numeric(year) + as.factor(policy) +
>> as.numeric(nation.grant)
>> >> +
>> >>     as.numeric(do.grant) + as.numeric(city.grant) +
>> as.numeric(DMZ.dist) +
>> >>     as.numeric(Seoul.dist)
>> >>                            Df Sum of Sq    RSS    AIC
>> >> - as.numeric(do.grant)      1      0.71 6622.9 492.28
>> >> - as.factor(policy)         1      1.21 6623.4 492.29
>> >> - as.numeric(DMZ.dist)      1      1.91 6624.1 492.30
>> >> - as.numeric(city.grant)    1      5.07 6627.3 492.36
>> >> - as.numeric(nation.grant)  1     11.51 6633.7 492.47
>> >> - as.numeric(year)          1     29.58 6651.8 492.80
>> >> <none>                                    6622.2 494.27
>> >> - as.numeric(Seoul.dist)    1    673.22 7295.4 503.79
>> >> Step:  AIC=492.28
>> >> pop.rate ~ as.numeric(year) + as.factor(policy) +
>> as.numeric(nation.grant)
>> >> +
>> >>     as.numeric(city.grant) + as.numeric(DMZ.dist) +
>> as.numeric(Seoul.dist)
>> >>                            Df Sum of Sq    RSS    AIC
>> >> - as.factor(policy)         1      1.99 6624.9 490.32
>> >> - as.numeric(DMZ.dist)      1      2.09 6625.0 490.32
>> >> - as.numeric(city.grant)    1      7.18 6630.1 490.41
>> >> - as.numeric(nation.grant)  1     20.08 6643.0 490.64
>> >> - as.numeric(year)          1     28.89 6651.8 490.80
>> >> <none>                                    6622.9 492.28
>> >> - as.numeric(Seoul.dist)    1    697.46 7320.4 502.20
>> >> Step:  AIC=490.32
>> >> pop.rate ~ as.numeric(year) + as.numeric(nation.grant) +
>> >> as.numeric(city.grant) +
>> >>     as.numeric(DMZ.dist) + as.numeric(Seoul.dist)
>> >>                            Df Sum of Sq    RSS    AIC
>> >> - as.numeric(DMZ.dist)      1      2.08 6627.0 488.35
>> >> - as.numeric(city.grant)    1     10.65 6635.6 488.51
>> >> - as.numeric(nation.grant)  1     31.30 6656.2 488.88
>> >> - as.numeric(year)          1     31.44 6656.4 488.88
>> >> <none>                                    6624.9 490.32
>> >> - as.numeric(Seoul.dist)    1    732.88 7357.8 500.80
>> >> Step:  AIC=488.35
>> >> pop.rate ~ as.numeric(year) + as.numeric(nation.grant) +
>> >> as.numeric(city.grant) +
>> >>     as.numeric(Seoul.dist)
>> >>                            Df Sum of Sq    RSS    AIC
>> >> - as.numeric(city.grant)    1      9.86 6636.9 486.53
>> >> - as.numeric(year)          1     31.42 6658.4 486.92
>> >> - as.numeric(nation.grant)  1     33.33 6660.3 486.95
>> >> <none>                                    6627.0 488.35
>> >> - as.numeric(Seoul.dist)    1    754.40 7381.4 499.18
>> >>
>> >> Error in step(lm(pop.rate ~ as.numeric(year) + as.factor(policy) +
>> >> as.numeric(nation.grant) +  :
>> >>
>> >> --------------------------------------------------------------------
>> -----------------------------------------------------------------------
>> >>   number of rows in use has changed: remove missing values?
>> >>
>> >> --------------------------------------------------------------------
>> ----------------------
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Kum-Hoe Hwang, Ph.D.
>> >>
>> >> Phone : 82-31-250-3516
>> >> Email : phdhwang at gmail.com
>> >>
>> >>
>> > --
>> > Peter Ehlers
>> > University of Calgary
>> >
>>
>>
>>
>> --
>> Kum-Hoe Hwang, Ph.D.
>>
>> Phone : 82-31-250-3516
>> Email : phdhwang at gmail.com
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Kum-Hoe Hwang, Ph.D.

Phone : 82-31-250-3516
Email : phdhwang at gmail.com



More information about the R-help mailing list