[R] Variable selection based on both training and testing data
Jin Minming
jminming at yahoo.com
Mon Jan 30 19:20:37 CET 2012
I do not have enough test data for regression analysis although I know there are some statistical regression methods that can be used for small dataset. That is why I need build a model firslty using training dataset.
Thanks,
Jim
--- On Mon, 30/1/12, Liaw, Andy <andy_liaw at merck.com> wrote:
> From: Liaw, Andy <andy_liaw at merck.com>
> Subject: RE: [R] Variable selection based on both training and testing data
> To: "'Jin Minming'" <jminming at yahoo.com>, "r-help at r-project.org" <r-help at r-project.org>
> Date: Monday, 30 January, 2012, 13:39
> Variable section is part of the
> training process-- it chooses the model. By
> definition, test data is used only for testing (evaluating
> chosen model).
>
> If you find a package or function that does variable
> selection on test data, run from it!
>
> Best,
> Andy
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
>
> > [mailto:r-help-bounces at r-project.org]
> On Behalf Of Jin Minming
> > Sent: Monday, January 30, 2012 8:14 AM
> > To: r-help at r-project.org
> > Subject: [R] Variable selection based on both training
> and
> > testing data
> >
> > Dear all,
> >
> > The variable selection in regression is usually
> determined by
> > the training data using AIC or F value, such as
> stepAIC. Is
> > there some R package that can consider both the
> training and
> > test dataset? For example, I have two separate training
> data
> > and test data. Firstly, a regression model is obtained
> by
> > using training data, and then this model is tested by
> using
> > test data. This process continues in order to find some
>
> > possible optimal models in terms of RMSE or R2 for both
>
> > training and test data.
> >
> > Thanks,
> >
> > Jim
> >
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> Notice: This e-mail message, together with any
> attachments, contains
> information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates Direct contact
> information
> for affiliates is available at
> http://www.merck.com/contact/contacts.html) that may be
> confidential,
> proprietary copyrighted and/or legally privileged. It is
> intended solely
> for the use of the individual or entity named on this
> message. If you are
> not the intended recipient, and have received this message
> in error,
> please notify us immediately by reply e-mail and then delete
> it from
> your system.
>
>
More information about the R-help
mailing list