[R] Testing Poisson GLMs with independent data: what's the Right Thing To Do?
Steve Cumming
stevec at berl.ab.ca
Sun Jan 30 03:55:22 CET 2005
Folks, my question is not R-specific, but I've struck out twice on
sci.stat.consult, so I'm turning to the R community. Even if it's a silly
question, I expect that someone present will probably tell me so...
I have been using multiple Poisson GLMs and similar count-regression models
to analyse forest songbird abundance data. Many of the species-level models
seem to fit the data pretty well.
My next task is to validate/verify/test these models using an independent
dataset collected for this purpose (no, really!) It seems obvious that I
should apply predict.glm() to the new covariates and then somehow compare
the observed values to the predicted expectations, but I don't know how
exactly. Some specific questions:
-what comparisons or performance measures are appropriate?
-how should the results be interpreted?
-is there some other (better) way to use the new data?
-am I overlooking something big?
Also, the covariates in the training and validation datasets are not even
approximately identically distributed (this was on purpose, for reasons I
will gladly explain to anyone interested). I expect this must matter, but
how?
My bibles (e.g. Cameron and Trivedi, McCullagh and Nelder) are silent on
these points, and I can find nothing on the Web or the obvious list
archives (nothing I recognise, anyway). If any reader of this group can
offer advice, suggestions, or references, I'd sure appreciate it.
Best regards
Steve Cumming
Boreal Ecosystems Research Ltd.
http://www.berl.ab.ca
More information about the R-help
mailing list