[R] goodness of "prediction" using a model (lm, glm, gam, brt, regression tree .... )
Corrado
ct529 at york.ac.uk
Thu Sep 3 07:56:59 CEST 2009
Dear R-friends,
How do you test the goodness of prediction of a model, when you predict on a
set of data DIFFERENT from the training set?
I explain myself: you train your model M (e.g. glm,gam,regression tree, brt)
on a set of data A with a response variable Y. You then predict the value of
that same response variable Y on a different set of data B (e.g. predict.glm,
predict.gam and so on). Dataset A and dataset B are different in the sense that
they contain the same variable, for example temperature, measured in different
sites, or on a different interval (e.g. B is a subinterval of A for
interpolation, or a different interval for extrapolation). If you have the
measured values for Y on the new interval, i.e. B, how do you measure how good
is the prediction, that is how well model fits the Y on B (that is, how well
does it predict)?
In other words:
Y~T,data=A for training
Y~T,data=B for predicting
I have devised a couple of method based around 1) standard deviation 2) R^2,
but I am unhappy with them.
Regards
--
Corrado Topi
Global Climate Change & Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct529 at york.ac.uk
More information about the R-help
mailing list