[R] Cross Validation output
Donald Catanzaro, PhD
dgcatanzaro at gmail.com
Fri Sep 26 19:17:35 CEST 2008
Good Day All,
I have a negative binomial model that I created using the function
glm.nb() with the MASS library and I am performing a cross-validation
using the function cv.glm() from the boot library. I am really
interested in determining the performance of this model so I can have
confidence (or not) when it might be applied elsewhere
If I understand the cv.glm() procedure correctly, the default cost
function is the average squared error and by running run cv.glm() in a
loop many times I understand that I can calculate PRESS (PRedictive
Error Sum of Squares = 1/n*Sum(all PEs) from the default output.
When I run a loop that is 10 times, my PRESS ~25
I have a few questions:
1) I must now confess my ignorance, how does one interpret my PRESS of
25 ? Are there some internet resources that someone could point me to
to help in the interpretation ? I've spent most of yesterday studying
up on things but feel like I am chasing my tail. Most of the resources
are either way so heavy in theory that I can't puzzle them out or are a
couple of paragraphs long and don't have example with data in them. Is
my PRESS in essence saying that my model performance is ~ 75% ? (I
suspect not, but I don't know thus I ask)
2) All my observations are spatial in nature and thus I would like to
plot out spatially where the model is performing well and where it is
not. This would be somewhat akin to inspecting residuals in OLS. Is
there a way to output from cv.glm() the PEs for individual data points ?
3) My previous idea was to look at AIC, BIC, McFaddenR2 and PseudoR2 as
Goodness of Fit measures of each subset model. It appears that I can
modify the cost function of cv.glm() but I am not to confident in my
ability to write the correct cost function. Are there other valid
measures of GOF for my negative binomial model that I can substitute
into the cost function of cv.glm() ? Would anyone care to recommend one
(or many) ?
Thanks in advance for your patience !
-Don
PS - if you've seen my previous posts, I've abandoned my 80/20 split
validation scheme.
--
-Don
Don Catanzaro, PhD Landscape Ecologist
dgcatanzaro at gmail.com 16144 Sigmond Lane
479-751-3616 Lowell, AR 72745
More information about the R-help
mailing list