[R] SVM Param Tuning with using SNOW package
David Winsemius
dwinsemius at comcast.net
Wed Nov 18 15:44:09 CET 2009
I cannot really be sure what you are trying to do, but doing a bit of
"surgery" on your code lets it run on a multicore Mac:
library(e1071)
library(snow)
library(pls)
data(gasoline)
X=gasoline$NIR
Y=gasoline$octane
NR=10
cost1=seq(0.5,30, length=NR)
sv.lin<- function(c) {
for (i in 1:NR) {
ind=sample(1:60,50)
gTest<- data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
svm.lin <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
cross=5)
results.lin <- predict(svm.lin, gTest$X)
e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
return(e.test.lin)
}
}
cl<- makeCluster(2, type="SOCK" )
clusterEvalQ(cl, library(e1071))
cost1=seq(0.5,30, length=NR)
clusterExport(cl,c("NR","Y","X", "cost1"))
# Pretty sure you need a copy of cost1 on each node.
RMSEP<-clusterApply(cl, cost1, sv.lin)
# I thought the second argument was the matrix or vector over which to
iterate.
stopCluster(cl)
# Since I don't know what the model meant, I cannot determine whehter
this result is interpretable>
> RMSEP
[[1]]
[1] 0.1921887
[[2]]
[1] 0.1924917
[[3]]
[1] 0.1885066
[[4]]
[1] 0.1871466
[[5]]
[1] 0.3550932
[[6]]
[1] 0.1226460
[[7]]
[1] 0.2426345
[[8]]
[1] 0.2126299
[[9]]
[1] 0.2276286
[[10]]
[1] 0.2064534
--
David Winsemius, MD
On Nov 18, 2009, at 7:09 AM, raluca wrote:
>
> Hi Charlie,
>
>
> Yes, you are perfectly right, when I make the clusters I should put
> 2, not
> 10 (it remained 10 from previous trials with 10 slaves).
>
> cl<- makeCluster(2, type="SOCK" )
>
> To tell the truth I do not understand very well what the 2nd
> parameter for
> clusterApplyLB() has to be.
>
> If the function sv.lin has just 1 parameter, sv.lin(c), where c is
> the cost,
> how should I call clusterApplyLB?
>
>
> ? clusterApply LB(cl, ?,sv.lin, c=cost1) ?
>
>
>
> Below, I am providing a working example, using the gasoline data
> that comes
> in the pls package.
>
> Thank you for your time!
>
>
> library(e1071)
> library(snow)
> library(pls)
>
> data(gasoline)
>
> X=gasoline$NIR
> Y=gasoline$octane
>
> NR=10
> cost1=seq(0.5,30, length=NR)
>
>
> sv.lin<- function(c) {
>
> for (i in 1:NR) {
>
> ind=sample(1:60,50)
> gTest<- data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>
> svm.lin <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
> cross=5)
> results.lin <- predict(svm.lin, gTest$X)
>
> e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>
> return(e.test.lin)
> }
> }
>
>
> cl<- makeCluster(2, type="SOCK" )
>
>
> clusterEvalQ(cl,library(e1071))
>
>
> clusterExport(cl,c("NR","Y","X"))
>
>
> RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)
>
> stopCluster(cl)
>
>
>
>
>
> cls59 wrote:
>>
>>
>> raluca wrote:
>>>
>>> Hello,
>>>
>>> Is the first time I am using SNOW package and I am trying to tune
>>> the
>>> cost parameter for a linear SVM, where the cost (variable cost1)
>>> takes 10
>>> values between 0.5 and 30.
>>>
>>> I have a large dataset and a pc which is not very powerful, so I
>>> need to
>>> tune the parameters using both CPUs of the pc.
>>>
>>> Somehow I cannot manage to do it. It seems that both CPUs are
>>> fitting the
>>> model for the same values of cost1, I guess the first 5, but not
>>> for the
>>> last 5.
>>>
>>> Please, can anyone help me!
>>>
>>> Here is the code:
>>>
>>> data <- data.frame(Y=I(Y),X=I(X))
>>> data.X<-data$X
>>> data.Y<-data$Y
>>>
>>>
>>
>>
>> Helping you will be difficult as we're only three lines into your
>> example
>> and already I have no idea what the data you are using looks like.
>> Example code needs to be fully reproducible-- that means a small
>> slice of
>> representative data needs to be provided or faked using an
>> appropriate
>> random number generator.
>>
>> Some things did jump out at me about your approach and I've made some
>> notes below.
>>
>>
>>
>> raluca wrote:
>>>
>>> NR=10
>>> cost1=seq(0.5,30, length=NR)
>>>
>>> sv.lin<- function(cl,c) {
>>>
>>> for (i in 1:NR) {
>>>
>>> ind=sample(1:414,276)
>>>
>>> hogTest<- data.frame(Y=I(data.Y[-ind]),X=I(data.X[-ind,]))
>>> hogTrain<- data.frame(Y=I(data.Y[ind]),X=I(data.X[ind,]))
>>>
>>> svm.lin <- svm(hogTrain$X,hogTrain$Y,
>>> kernel="linear",cost=c[i],
>>> cross=5)
>>> results.lin <- predict(svm.lin, hogTest$X)
>>>
>>> e.test.lin <- sqrt(sum((results.lin-hogTest$Y)^2)/
>>> length(hogTest$Y))
>>>
>>> return(e.test.lin)
>>> }
>>> }
>>>
>>> cl<- makeCluster(10, type="SOCK" )
>>>
>>
>>
>> If your machine has two cores, why are you setting up a cluster
>> with 10
>> nodes? Usually the number of nodes should equal the number of
>> cores on
>> your machine in order to keep things efficient.
>>
>>
>>
>> raluca wrote:
>>>
>>>
>>> clusterEvalQ(cl,library(e1071))
>>>
>>> clusterExport(cl,c("data.X","data.Y","NR","cost1"))
>>>
>>> RMSEP<-clusterApplyLB(cl,cost1,sv.lin)
>>>
>>
>>
>> Are you sure this evaluation even produces results? sv.lin() is a
>> function
>> you defined above that takes two parameters-- "cl" and "c".
>> clusterApplyLB() will feed values of cost1 into sv.lin() for the
>> argument
>> "cl", but it has nothing to give for "c". At the very least, it
>> seems
>> like you would need something like:
>>
>> RMSEP <- clusterApplyLB( cl, cost1, sv.lin, c = someVector )
>>
>>
>>
>> raluca wrote:
>>>
>>>
>>> stopCluster(cl)
>>>
>>>
>>
>>
>> Sorry I can't be very helpful, but with no data and no apparent way
>> to
>> legally call sv.lin() the way you have it set up, I can't
>> investigate the
>> problem to see if I get the same results you described. If you could
>> provide a complete working example, then there's a better chance that
>> someone on this list will be able to help you.
>>
>> Good luck!
>>
>> -Charlie
>>
>
> --
> View this message in context: http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26406709.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list