[R] Testing Poisson GLMs with independent data: what's the Right Thing To Do?

Steve Cumming stevec at berl.ab.ca
Sun Jan 30 03:55:22 CET 2005


Folks, my question is not R-specific, but I've struck out twice on
sci.stat.consult, so I'm turning to the R community. Even if it's a silly
question, I expect that someone present will probably tell me so...

I have been using multiple Poisson GLMs and similar count-re­gression models
to analyse forest songbird abundance data. Many of the spe­cies-level models
seem to fit the data pretty well.

My next task is to validate/verify/test these models using an independen­t
dataset collected for this purpose (no, really!) It seems obvious that I
should apply predict.glm() to the new covariates and then somehow compare
the observed values to the predicted expectations, but I don't know how
exactly. Some specific questions:

	-what comparisons or performance measures are appropriate?
	-how should the results be interpreted?
	-is there some other (better) way to use the new data?
	-am I overlooking something big?

Also, the covariates in the t­raining and validation datasets are not even
approximately identically d­istributed (this was on purpose, for reasons I
will gladly explain to anyone interested). I expect this must matter, but
how?

My bibles (e.g. ­Cameron and Trivedi, McCullagh and Nelder) are silent on
these points, and­ I can find nothing on the Web or the obvious list
archives (nothing I recognise, anyway). If any read­er of this group can
offer advice, suggestions, or references, I'd s­ure appreciate it.


Best regards

Steve Cumming
Boreal Ecosystems Research Ltd.
http://www.berl.ab.ca




More information about the R-help mailing list