[R] In-sample / Out-of-sample using R
Ajay Shah
ajayshah at mayin.org
Tue Apr 13 18:08:15 CEST 2004
I'm trying to learn how to use R to:
* Make a random partition of a data frame between in-sample and
out-of-sample
* Estimate a model (e.g. lm()) for the in-sample
* Make predictions for all observations
* Compare the in-sample error sigma against the out-of-sample error
sigma.
I came up with the following code. I think it's okay, but I can't help
feeling this is still clunky. Could all ye R wizards please comment on
this, and tell me how I can do it better?
---------------------------------------------------------------------------
# Simulate some data for a linear regression (100 points)
x = runif(100); y = 2 + 3*x + rnorm(100)
D = data.frame(x, y)
# Choose a random subset of 25 points which will be "in sample"
d = sort(sample(100, 25)) # Sorting just makes d more readable
cat("Subset of insample points --\n"); print(d)
# Estimate a linear regression using all points
m1 = lm(y ~ x, D)
# Estimate a linear regression using only the subset
m2 = lm(y ~ x, D, subset=d)
# Get to predictions --
yhat1 = predict.lm(m1, D); yhat2 = predict.lm(m2, D)
# And standard deviations of errors --
full.s = sd(y - yhat1)
insample.s = sd(y[d] - yhat2[d])
outsample.s = sd(y[-d] - yhat2[-d])
cat("Sigmas of prediction errors --\n")
cat(" All points used in estimation, in sample : ", full.s, "\n")
cat(" 25 points used in estimation, in sample : ", insample.s, "\n")
cat(" 25 points used in estimation, out of sample : ", outsample.s, "\n")
---------------------------------------------------------------------------
Here's what I get when I run it:
$ R --slave < insampleoutsample.R
Subset of insample points --
[1] 4 6 7 13 20 21 24 25 26 27 29 33 34 36 39 45 47 48 59 60 88 89 91 96 98
Sigmas of prediction errors --
All points used in estimation, in sample : 0.9405517
25 points used in estimation, in sample : 1.000709
25 points used in estimation, out of sample : 0.9586921
--
Ajay Shah Consultant
ajayshah at mayin.org Department of Economic Affairs
http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
More information about the R-help
mailing list