[R] an off-topic question -> model validation

bogdan romocea br44114 at yahoo.com
Fri Nov 12 15:10:57 CET 2004

Assuming you have enough data, usually 1/4 to 1/2 is used for

One reference would be
Picard, R.R. and Berk, K.N. (1990)
"Data Splitting," The American Statistician, 44;140-147.


-----Original Message-----
From: Wensui Liu [mailto:liuwensui at gmail.com]
Sent: Thursday, November 11, 2004 10:20 PM
To: r-help at stat.math.ethz.ch
Subject: [R] an off-topic question -> model validation

Currently, I am working on a data mining project and plan to divide
the data table into 2 parts, one for modeling and the other for
validation to compare several models.

But I am not sure about the percentage of data I should use to build
the model and the one I should keep to validate the model.

Is there any literature reference about this topic? 

Thank you so much!

R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide!

More information about the R-help mailing list