[R] e1071 SVM, cross-validation and overfitting

Robert Poor rdpoor at gmail.com
Tue Jan 15 20:11:34 CET 2013


I am accustomed to the LIBSVM package, which provides cross-validation
on training with the -v option

  % svm-train -v 5 ...

This does 5 fold cross validation while building the model and avoids
over-fitting.

But I don't see how to accomplish that in the e1071 package.  (I
learned that svm(... cross=5 ...) only _tests_ using cross-validation
-- it doesn't affect the training.)  Can someone clue me in how to do
something equivalent to LIBSVM's -v option?

Thanks!

- ff

P.S.: My test case follows.  If you run the code, the "tuned" output
shows clear signs of over-fitting.  I'd like to eliminate that.

require('e1071')
colors <- c(2, 3, 4, 5, 6)
set.seed(23) # set random seed for repeatability

# log(x) + cos(x) + noise
f <- function(x) log(x) + cos(x)
x <- seq(0.1, 5, by = 0.05)
y <- f(x) + rnorm(x, sd = 0.2)
plot(x, y, col="gray80")

legend("topleft",
       c("log(x) + cos(x)", "SVM, untuned", "SVM, tuned"),
       bty="n",
       col=colors,
       pch=20)

lines(x, f(x), col = colors[1]) # overlay noiseless data

# SVM, untuned
svmmodel1 <- svm(x, y)
print(summary(svmmodel1))
y1 <- predict(svmmodel1, x)
lines(x, y1, col = colors[2])

# SVM with tuning
tuning <- tune.svm(x, y, gamma = 2^(-4:4), cost = 2^(-2:2))
svmmodel2 <- tuning$best.model
print(summary(svmmodel2))
y2 <- predict(svmmodel2, x)
lines(x, y2, col = colors[3])



More information about the R-help mailing list