# [R] Cross Validation output

Donald Catanzaro, PhD dgcatanzaro at gmail.com
Fri Sep 26 19:17:35 CEST 2008

```Good Day All,

I have a negative binomial model that I created using the function
glm.nb() with the MASS library and I am performing a cross-validation
using the function cv.glm() from the boot library.  I am really
interested in determining the performance of this model so I can have
confidence (or not) when it might be applied elsewhere

If I understand the cv.glm() procedure correctly, the default cost
function is the average squared error and by running run cv.glm() in a
loop many times I understand that I can calculate PRESS (PRedictive
Error Sum of Squares = 1/n*Sum(all PEs) from the default output.

When I run a loop that is 10 times, my PRESS ~25

I have a few questions:

1)  I must now confess my ignorance, how does one interpret my PRESS of
25 ?  Are there some internet resources that someone could point me to
to help in the interpretation ?  I've spent most of yesterday studying
up on things but feel like I am chasing my tail.  Most of the resources
are either way so heavy in theory that I can't puzzle them out or are a
couple of paragraphs long and don't have example with data in them.  Is
my PRESS in essence saying that my model performance is ~ 75% ? (I
suspect not, but I don't know thus I ask)

2)  All my observations are spatial in nature and thus I would like to
plot out spatially where the model is performing well and where it is
not.  This would be somewhat akin to inspecting residuals in OLS. Is
there a way to output from cv.glm() the PEs for individual data points ?

3)  My previous idea was to look at AIC, BIC, McFaddenR2 and PseudoR2 as
Goodness of Fit measures of each subset model.  It appears that I can
modify the cost function of cv.glm() but I am not to confident in my
ability to write the correct cost function.  Are there other valid
measures of GOF for my negative binomial model that I can substitute
into the cost function of cv.glm() ?  Would anyone care to recommend one
(or many) ?

-Don

PS - if you've seen my previous posts, I've abandoned my 80/20 split
validation scheme.

--

-Don

Don Catanzaro, PhD                  Landscape Ecologist
dgcatanzaro at gmail.com               16144 Sigmond Lane
479-751-3616                        Lowell, AR 72745

```