[R] Recursive Feature Elimination with SVM
Bert Gunter
bgunter@4567 @ending from gm@il@com
Wed Jan 2 17:18:13 CET 2019
Note: **NOT** reproducible (only you have "data.csv").
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Jan 1, 2019 at 11:14 PM Priyanka Purkayastha <
ppurkayastha2010 using gmail.com> wrote:
> This is the code I tried,
>
> library(e1071)
> library(caret)
> library(ROCR)
>
> data <- read.csv("data.csv", header = TRUE)
> set.seed(998)
>
> inTraining <- createDataPartition(data$Class, p = .70, list = FALSE)
> training <- data[ inTraining,]
> testing <- data[-inTraining,]
>
> while(length(data)>0){
>
> ## Building the model ####
> svm.model <- svm(Class ~ ., data = training,
>
> cross=10,metric="ROC",type="eps-regression",kernel="linear",na.action=na.omit,probability
> = TRUE)
> print(svm.model)
>
>
> ###### auc measure #######
>
> #prediction and ROC
> svm.model$index
> svm.pred <- predict(svm.model, testing, probability = TRUE)
>
> #calculating auc
> c <- as.numeric(svm.pred)
> c = c - 1
> pred <- prediction(c, testing$Class)
> perf <- performance(pred,"tpr","fpr")
> plot(perf,fpr.stop=0.1)
> auc <- performance(pred, measure = "auc")
> auc <- auc using y.values[[1]]
> print(length(data))
> print(auc)
>
> #compute the weight vector
> w = t(svm.model$coefs)%*%svm.model$SV
>
> #compute ranking criteria
> weight_matrix = w * w
>
> #rank the features
> w_transpose <- t(weight_matrix)
> w2 <- as.matrix(w_transpose[order(w_transpose[,1], decreasing = FALSE),])
> a <- as.matrix(w2[which(w2 == max(w2)),]) #to get the rows with minimum
> values
> row.names(a) -> remove
> training<- data[,setdiff(colnames(data),remove)]
> }
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Jan 2, 2019 at 11:18 AM David Winsemius <dwinsemius using comcast.net>
> wrote:
>
> >
> > On 1/1/19 5:31 PM, Priyanka Purkayastha wrote:
> > > Thankyou David.. I tried the same, I gave x as the data matrix and y
> > > as the class label. But it returned an empty "featureRankedList". I
> > > get no output when I try the code.
> >
> >
> > If you want people to spend time on this you should post a reproducible
> > example. See the Posting Guide ... and learn to post in plain text.
> >
> >
> > --
> >
> > David
> >
> > >
> > > On Tue, 1 Jan 2019 at 11:42 PM, David Winsemius
> > > <dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>> wrote:
> > >
> > >
> > > On 1/1/19 4:40 AM, Priyanka Purkayastha wrote:
> > > > I have a dataset (data) with 700 rows and 7000 columns. I am
> > > trying to do
> > > > recursive feature selection with the SVM model. A quick google
> > > search
> > > > helped me get a code for a recursive search with SVM. However, I
> > > am unable
> > > > to understand the first part of the code, How do I introduce my
> > > dataset in
> > > > the code?
> > >
> > >
> > > Generally the "labels" is given to such a machine learning device
> > > as the
> > > y argument, while the "features" are passed as a matrix to the x
> > > argument.
> > >
> > >
> > > --
> > >
> > > David.
> > >
> > > >
> > > > If the dataset is a matrix, named data. Please give me an
> > > example for
> > > > recursive feature selection with SVM. Bellow is the code I got
> for
> > > > recursive feature search.
> > > >
> > > > svmrfeFeatureRanking = function(x,y){
> > > >
> > > > #Checking for the variables
> > > > stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)
> > > >
> > > > n = ncol(x)
> > > > survivingFeaturesIndexes = seq_len(n)
> > > > featureRankedList = vector(length=n)
> > > > rankedFeatureIndex = n
> > > >
> > > > while(length(survivingFeaturesIndexes)>0){
> > > > #train the support vector machine
> > > > svmModel = svm(x[, survivingFeaturesIndexes], y, cost = 10,
> > > > cachesize=500,
> > > > scale=FALSE, type="C-classification",
> > > kernel="linear" )
> > > >
> > > > #compute the weight vector
> > > > w = t(svmModel$coefs)%*%svmModel$SV
> > > >
> > > > #compute ranking criteria
> > > > rankingCriteria = w * w
> > > >
> > > > #rank the features
> > > > ranking = sort(rankingCriteria, index.return = TRUE)$ix
> > > >
> > > > #update feature ranked list
> > > > featureRankedList[rankedFeatureIndex] =
> > > > survivingFeaturesIndexes[ranking[1]]
> > > > rankedFeatureIndex = rankedFeatureIndex - 1
> > > >
> > > > #eliminate the feature with smallest ranking criterion
> > > > (survivingFeaturesIndexes =
> > > survivingFeaturesIndexes[-ranking[1]])}
> > > > return (featureRankedList)}
> > > >
> > > >
> > > >
> > > > I tried taking an idea from the above code and incorporate the
> > > idea in my
> > > > code as shown below
> > > >
> > > > library(e1071)
> > > > library(caret)
> > > >
> > > > data<- read.csv("matrix.csv", header = TRUE)
> > > >
> > > > x <- data
> > > > y <- as.factor(data$Class)
> > > >
> > > > svmrfeFeatureRanking = function(x,y){
> > > >
> > > > #Checking for the variables
> > > > stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)
> > > >
> > > > n = ncol(x)
> > > > survivingFeaturesIndexes = seq_len(n)
> > > > featureRankedList = vector(length=n)
> > > > rankedFeatureIndex = n
> > > >
> > > > while(length(survivingFeaturesIndexes)>0){
> > > > #train the support vector machine
> > > > svmModel = svm(x[, survivingFeaturesIndexes], y,
> > > cross=10,cost =
> > > > 10, type="C-classification", kernel="linear" )
> > > >
> > > > #compute the weight vector
> > > > w = t(svmModel$coefs)%*%svmModel$SV
> > > >
> > > > #compute ranking criteria
> > > > rankingCriteria = w * w
> > > >
> > > > #rank the features
> > > > ranking = sort(rankingCriteria, index.return = TRUE)$ix
> > > >
> > > > #update feature ranked list
> > > > featureRankedList[rankedFeatureIndex] =
> > > > survivingFeaturesIndexes[ranking[1]]
> > > > rankedFeatureIndex = rankedFeatureIndex - 1
> > > >
> > > > #eliminate the feature with smallest ranking criterion
> > > > (survivingFeaturesIndexes =
> > > survivingFeaturesIndexes[-ranking[1]])}
> > > >
> > > > return (featureRankedList)}
> > > >
> > > > But couldn't do anything at the stage "update feature ranked
> list"
> > > > Please guide
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> > > -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> code.
> > >
> > > --
> > > Regards,
> > >
> > > Priyanka Purkayastha, M.Tech, Ph.D.,
> > > SERB National Postdoctoral Researcher
> > > Genomics and Systems Biology Lab,
> > > Department of Chemical Engineering,
> > > Indian Institute of Technology Bombay (IITB),
> > > Powai, Mumbai- 400076
> > >
> > >
> > >
> >
>
>
> --
> Regards,
>
> Priyanka Purkayastha, M.Tech, Ph.D.,
> SERB National Postdoctoral Researcher
> Genomics and Systems Biology Lab,
> Department of Chemical Engineering,
> Indian Institute of Technology Bombay (IITB),
> Powai, Mumbai- 400076
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list