[R] Variable selection based on both training and testing data
Liaw, Andy
andy_liaw at merck.com
Mon Jan 30 14:39:05 CET 2012
Variable section is part of the training process-- it chooses the model. By definition, test data is used only for testing (evaluating chosen model).
If you find a package or function that does variable selection on test data, run from it!
Best,
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jin Minming
> Sent: Monday, January 30, 2012 8:14 AM
> To: r-help at r-project.org
> Subject: [R] Variable selection based on both training and
> testing data
>
> Dear all,
>
> The variable selection in regression is usually determined by
> the training data using AIC or F value, such as stepAIC. Is
> there some R package that can consider both the training and
> test dataset? For example, I have two separate training data
> and test data. Firstly, a regression model is obtained by
> using training data, and then this model is tested by using
> test data. This process continues in order to find some
> possible optimal models in terms of RMSE or R2 for both
> training and test data.
>
> Thanks,
>
> Jim
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list