[R] Testing Poisson GLMs with independent data: what's the Right Thing To Do?

Sun Jan 30 03:55:22 CET 2005

Folks, my question is not R-specific, but I've struck out twice on
sci.stat.consult, so I'm turning to the R community. Even if it's a silly
question, I expect that someone present will probably tell me so...

I have been using multiple Poisson GLMs and similar count-regression models
to analyse forest songbird abundance data. Many of the species-level models
seem to fit the data pretty well.

My next task is to validate/verify/test these models using an independent
dataset collected for this purpose (no, really!) It seems obvious that I
should apply predict.glm() to the new covariates and then somehow compare
the observed values to the predicted expectations, but I don't know how
exactly. Some specific questions:

	-what comparisons or performance measures are appropriate?
	-how should the results be interpreted?
	-is there some other (better) way to use the new data?
	-am I overlooking something big?

Also, the covariates in the training and validation datasets are not even
approximately identically distributed (this was on purpose, for reasons I
will gladly explain to anyone interested). I expect this must matter, but
how?

My bibles (e.g. Cameron and Trivedi, McCullagh and Nelder) are silent on
these points, and I can find nothing on the Web or the obvious list
archives (nothing I recognise, anyway). If any reader of this group can
offer advice, suggestions, or references, I'd sure appreciate it.

Best regards

Steve Cumming
Boreal Ecosystems Research Ltd.
http://www.berl.ab.ca