cv.glm {boot} | R Documentation |

## Cross-validation for Generalized Linear Models

### Description

This function calculates the estimated K-fold cross-validation prediction error for generalized linear models.

### Usage

```
cv.glm(data, glmfit, cost, K)
```

### Arguments

`data` |
A matrix or data frame containing the data. The rows should be cases and the columns correspond to variables, one of which is the response. |

`glmfit` |
An object of class |

`cost` |
A function of two vector arguments specifying the cost function for the
cross-validation. The first argument to |

`K` |
The number of groups into which the data should be split to estimate the
cross-validation prediction error. The value of |

### Details

The data is divided randomly into `K`

groups. For each group the generalized
linear model is fit to `data`

omitting that group, then the function `cost`

is applied to the observed responses in the group that was omitted from the fit
and the prediction made by the fitted models for those observations.

When `K`

is the number of observations leave-one-out cross-validation is used
and all the possible splits of the data are used. When `K`

is less than
the number of observations the `K`

splits to be used are found by randomly
partitioning the data into `K`

groups of approximately equal size. In this
latter case a certain amount of bias is introduced. This can be reduced by
using a simple adjustment (see equation 6.48 in Davison and Hinkley, 1997).
The second value returned in `delta`

is the estimate adjusted by this method.

### Value

The returned value is a list with the following components.

`call` |
The original call to |

`K` |
The value of |

`delta` |
A vector of length two. The first component is the raw cross-validation estimate of prediction error. The second component is the adjusted cross-validation estimate. The adjustment is designed to compensate for the bias introduced by not using leave-one-out cross-validation. |

`seed` |
The value of |

### Side Effects

The value of `.Random.seed`

is updated.

### References

Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984)
*Classification and Regression Trees*. Wadsworth.

Burman, P. (1989) A comparative study of ordinary cross-validation,
*v*-fold cross-validation and repeated learning-testing methods.
*Biometrika*, **76**, 503–514

Davison, A.C. and Hinkley, D.V. (1997)
*Bootstrap Methods and Their Application*. Cambridge University Press.

Efron, B. (1986) How biased is the apparent error rate of a prediction rule?
*Journal of the American Statistical Association*, **81**, 461–470.

Stone, M. (1974) Cross-validation choice and assessment of statistical
predictions (with Discussion).
*Journal of the Royal Statistical Society, B*, **36**, 111–147.

### See Also

### Examples

```
# leave-one-out and 6-fold cross-validation prediction error for
# the mammals data set.
data(mammals, package="MASS")
mammals.glm <- glm(log(brain) ~ log(body), data = mammals)
(cv.err <- cv.glm(mammals, mammals.glm)$delta)
(cv.err.6 <- cv.glm(mammals, mammals.glm, K = 6)$delta)
# As this is a linear model we could calculate the leave-one-out
# cross-validation estimate without any extra model-fitting.
muhat <- fitted(mammals.glm)
mammals.diag <- glm.diag(mammals.glm)
(cv.err <- mean((mammals.glm$y - muhat)^2/(1 - mammals.diag$h)^2))
# leave-one-out and 11-fold cross-validation prediction error for
# the nodal data set. Since the response is a binary variable an
# appropriate cost function is
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
(cv.err <- cv.glm(nodal, nodal.glm, cost, K = nrow(nodal))$delta)
(cv.11.err <- cv.glm(nodal, nodal.glm, cost, K = 11)$delta)
```

*boot*version 1.3-31 Index]