[R] SVM Param Tuning with using SNOW package
David Winsemius
dwinsemius at comcast.net
Thu Nov 19 04:50:41 CET 2009
On Nov 18, 2009, at 12:35 PM, Max Kuhn wrote:
> On Tue, Nov 17, 2009 at 6:01 PM, raluca <ucagui at hotmail.com> wrote:
>>
>> Hello,
>>
>> Is the first time I am using SNOW package and I am trying to tune
>> the cost
>> parameter for a linear SVM, where the cost (variable cost1) takes
>> 10 values
>> between 0.5 and 30.
>>
>> I have a large dataset and a pc which is not very powerful, so I
>> need to
>> tune the parameters using both CPUs of the pc.
>>
>> Somehow I cannot manage to do it. It seems that both CPUs are
>> fitting the
>> model for the same values of cost1, I guess the first 5, but not
>> for the
>> last 5.
>>
>> Please, can anyone help me! :-((
>
> This is pretty easy to do with the train() funciton in the caret
> package. From ?train, here is an example for a different data set
>
>> library(caret)
>> library(snow)
>> library(mlbench)
>>
>> data(BostonHousing)
>>
>> mpiCalcs <- function(X, FUN, ...)
> + {
> + theDots <- list(...)
> + parLapply(theDots$cl, X, FUN)
> + }
>>
>> library(snow)
>> cl <- makeCluster(5, "MPI")
>>
>> ## 50 bootstrap models distributed across 5 workers
>> mpiControl <- trainControl(workers = 5,
> + number = 50,
> + computeFunction = mpiCalcs,
> + computeArgs = list(cl = cl))
>> set.seed(1)
>> usingMPI <- train(medv ~ .,
> + data = BostonHousing,
> + "svmLinear",
> + tuneGrid = data.frame(.C = seq(.5, 30, length =
> 10)),
> + trControl = mpiControl)
>>
>> stopCluster(cl)
> [1] 1
>
Well, that _was_ interesting. I submitted this job modified to set the
number of clusters and workers set to eight on a Mac Pro (with 8 cores
and 16 GB) and watched the cpu usage as reported by Activity
Monitor.app. The cpu activity is divided into system and user and over
the course of that run (which took a several minutes) the system
proportion gradually rose o about 75% of total.
Was it your expectation that this task was comparable in complexity to
that offered by the OP?
And should I be looking for a tangible result? Looking at usingMPI
with str() I see a 50 x 506 matrix, no it's a list, usingMPI%control
$index, of integers as well as quite a bit of other material that
looks like input and side-effects of the multi-processor activity or
setup.
--
David
>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list