# [R] Re: SVM questions

David Meyer david.meyer at ci.tuwien.ac.at
Tue Aug 20 16:56:31 CEST 2002

```>
> So i guess from your prev. email the svmModel\$coefs correspond to the
> "Alpha" .

yes (times the sign of y!).

>
> Why do I see three columns in the coefs?( Is this the  number of classes -1
> = Numbe of hyperplanes)

yes, but in a packed format which is not trivial.

I attach some explanation I sent to R-help some time ago (the guy wanted
to write his own predict method).

> Is there a way to determine from the SV's key components of which in this
> case would be which of the 2000 genes helped in the classifying of a vector
> for a class(we have 4 classes in this example) that attributes the to
> classification .
>

Yes: the elements of the SV-component of the model returned by svm() are
grouped by class, and the number of SV are stored in the nSV component.

So you should get the start indices with sth. like
cumsum(c(1,model\$nSV)).

g.,

David.

----------------

For class prediction in the binary case, the class of a new data vector
``n'' is usually given by *the sign* of

Sum(a_i * y_i * K(x_i, n)) + rho
i

where x_i is the i-th support vector, y_i the corresponding label, a_i
the corresponding coefficiant, and K is the kernel (in your case, the
linear one, so
K(u,v) = u'v).

Now, ``libsvm'' actually returns a_i * y_i as i-th coefficiant and the
*negative* rho, so in fact uses the formula:

Sum(coef_i * K(x_i, n)) - rho
i

where the training examples (=training data) are labeled {1,-1}.

A simplified R function for prediction with linear kernel would be:

svmpred <- function (m, newdata, K=crossprod) {
## this guy does the computation:
pred.one <- function (x)
sign(sum(sapply(1:m\$tot.nSV, function (j)
K(m\$SV[j,], x) * m\$coefs[j]
)
) - m\$rho
)

## this is just for convenience:
if (is.vector(newdata))
newdata <- t(as.matrix(x))
sapply (1:nrow(newdata),
function (i) pred.one(newdata[i,]))
}

where ``pred.one'' does the actual prediction for one new data vector,
the rest is just a convenience for prediction of multiple new examples.

It's easy to extend this to other kernels, just replace ``K'' with the
appropriate function [see the help page for the formulas used] and

Note, however, that multi-class prediction is more complicated, because
the coefficiants of the diverse binary svm's are stored in a compressed
format.

---------------------------------------------------------

To handle k classes, k>2, svm trains all binary subclassifiers
(one-against-one-method) and then uses a voting mechanism to determine
the actual class.

Now, this means k(k-1)/2 classifiers, hence in principle k(k-1)/2 sets
of
SV's, coefficiants and rhos. These are stored in a compressed format:

1) Only one SV is stored in case it were used by several clasifiers. The
model\$SV-matrix is ordered by classes, and you find the starting indices
by using nSV (number of SV's):

start <- c(1, cumsum(model\$nSV))
start <- start[-length(start)]

sum(nSV) equals the total nr. of (distinct) SVs.

2) The coefficients of the SV's are stored in the model\$coefs-matrix,
grouped by classes. Because the separating hyperplanes found by the
svm-algorithm has SV's on both sides, you will have two sets of
coefficients per binary classifier, and e.g., for 3 classes, you could
build a *block*-matrix like this for the classifiers (i,j)
[i,j=class numbers):

j     0         1        2
i
0        X     set (0,1)  set (0,2)
1    set (1,0)     X      set (1,2)
2    set (2,0) set (2,1)     X

where set(i,j) are the coefficients for the classifier (i,j), lying on
the side of class j.

Because there are no entries for (i,i), we can save the diagonal and
shift up the lower triangular matrix to get

j     0         1        2
i
0    set (1,0) set (0,1)  set (0,2)
1    set (2,0) set (2,1)  set (1,2)

Each set (.,j) has length nSV[j], so of course, there will be some
filling 0s in some sets.

model\$coefs is the *transposed* of such a matrix, therefore for the
Glass Data which has 6 classes, you get 6-1=5 columns.

The coefficients of (i,j) start at

m\$coefs[start[i],j]

and those of (j,i) at

m\$coefs[start[j],i-1) .

The k(k-1)/2 rhos are just linearly stored in the vector m\$rho.

The following code shows how to use this for prediction:

## Linear Kernel function
K <- function(i,j) crossprod(i,j)

predsvm <- function(object, newdata) {
## compute start-index
start <- c(1, cumsum(object\$nSV)+1)
start <- start[-length(start)]

## compute kernel values
kernel <- sapply (1:object\$tot.nSV,
function (x) K(object\$SV[x,], newdata))

## compute raw prediction for classifier (i,j)
predone <- function (i,j) {
## ranges for class i and j:
ri <- start[i] : (start[i] + object\$nSV[i] - 1)
rj <- start[j] : (start[j] + object\$nSV[j] - 1)

## coefs for (i,j):
coef1 <- object\$coefs[ri, j-1]
coef2 <- object\$coefs[rj, i]

## return raw values:
crossprod(coef1, kernel[ri]) + crossprod(coef2, kernel[rj])
}

## compute votes for all classifiers
c <- 0 # rho counter
for (i in 1 : (object\$nclasses - 1))
for (j in (i + 1) : object\$nclasses)
if (predone(i,j) > object\$rho[c <- c + 1])
else

## return winner (index with max. votes)