[R] Generic functions with models? 'predict', 'residuals', 'fitted', ...?
John McHenry
john_d_mchenry at yahoo.com
Wed Feb 4 03:10:17 CET 2009
Hi WizaRds!
I don't know if my understanding of models is screwed up or
if there's a fundamental limitation to the way they are
used in R.
Here's what I'd like to do: create a model and then test data
against that model using generics like 'predict' and 'residuals'.
The problem is that generics don't work for new data sets, except
of course 'predict':
set.seed(1)
df<- data.frame(matrix(rnorm(10*4), nc=4))
cn<- c("y", "x1", "x2", "x3")
colnames(df)<- cn
model<- lm(y ~ x1 + x2 + x3 + 1, data=df)
u<- residuals(model, data=df)
test<- data.frame(matrix(rnorm(10*4), nc=4))
colnames(test)<- cn
y.hat<- predict(model, newdata=test)
u.new<- residuals(model, newdata=test)
# I would expect to get the residuals to the 'test'
# data fit to 'model' with 'residuals(model, newdata=test)',
# alas no:
cbind(u, u.new)
u u.new
1 0.032996295 0.032996295
2 0.575261650 0.575261650
3 -0.702501425 -0.702501425
4 0.482075538 0.482075538
5 0.300600839 0.300600839
6 -0.980832577 -0.980832577
7 0.265239680 0.265239680
8 -0.341625071 -0.341625071
9 0.363224527 0.363224527
10 0.005560545 0.005560545
# I would expect to get the 'test' fit to 'model'
# with 'fitted(model, newdata=test)' but again no:
cbind(fitted(model), fitted(model, newdata=test))
[,1] [,2]
1 -0.65945011 -0.65945011
2 -0.39161833 -0.39161833
3 -0.13312719 -0.13312719
4 1.11320526 1.11320526
5 0.02890693 0.02890693
6 0.16036419 0.16036419
7 0.22218937 0.22218937
8 1.07994978 1.07994978
9 0.21255683 0.21255683
10 -0.31094893 -0.31094893
I have to do:
y.hat<- predict(model, newdata=test)
# get test residuals:
u.new<- test$y - y.hat
u.new
1 2 3 4 5 6 7
1.3661751 -0.4077999 1.1740108 0.4492772 -1.5865168 -0.7633108 -0.8800892
8 9 10
1.7474658 -0.1016104 2.1081361
to get the residuals. Maybe this is what one is supposed to do?
To be fair, the help pages for e.g. 'residuals.lm'
don't say that you can do things like 'newdata=test' in:
'residuals(model, newdata=test)'
or 'data=test' in:
'residuals(model, data=test)'
but that is exactly what I would like to do!
Am I missing something here, or is this just the way it is?
Thanks guys!
Jack.
More information about the R-help
mailing list