[R] Help with SVM package Kernlab

Vishal Thapar vishalthapar at gmail.com
Sat Dec 26 09:53:41 CET 2009


Hi All,

Thank you for your replies so far. I was hoping I could get some more input
from you on this issue. It seems to me that I have hit a dead end here and
would really appreciate some feedback. I have followed all the suggestions
you have mentioned but they still this is stuck. Earlier I thought that it
was a "factor" issue but now even that is not the error. Here is the script
and the error. Thanks for your help. I have attached the sample test file as
well as the training file in case you would like to run it locally.
---------------------------------
library(seqinr)
library("kernlab")

### Reading in the data
mars500_1_fasta = read.fasta("toClassify500_1.fasta")
mars500_1_seq = t(getSequence(mars500_1_fasta)) # get the sequences from the
fasta object
mars500_1_df = as.data.frame(mars500_1_seq,stringsAsFactors=FALSE) # convert
it to a Data Frame
class = append(rep("+",times=128),rep("-",times=128)) # add the Class field
to the data frame for classification
mars500_1_df = cbind(Class=class,mars500_1_df)
mars500_1_df = data.frame(lapply(mars500_1_df,factor)) #Finally apply the
factor() function
#####
##### Call the ksvm() function to create a model
mars500_1 <- ksvm(Class ~ ., data = mars500_1_df, kernel = "rbfdot", kpar =
"automatic", C = 60, cross = 3, prob.model = TRUE)

testSeq_fa=read.fasta("temp1.fasta")
testSeq_seq=t(getSequence(testSeq_fa))
testSeq_df=as.data.frame(testSeq_seq,stringsAsFactors=FALSE)
testSeq_df = cbind(Class="-",testSeq_df)
testSeq_df = data.frame(lapply(testSeq_df,factor))
predict(mars500_1,testSeq_df)

Error in .local(object, ...) : test vector does not match model !

Thanks in advance.

Sincerely,

Vishal


On Fri, Dec 25, 2009 at 8:10 AM, Vishal Thapar <vishalthapar at gmail.com>wrote:

> Hi,
>
> I seem to have made some headway on this problem but its still not solved.
> It seems like this is a "factor" issue. When I read my training set, I read
> it with read.csv() which converts each of the columns as "factors". From
> this if I take a single row as my testSeq, it works great. On the other
> hand, when I read in my test sequence from a Fasta file, I am using the
> "seqinr" package's function "readFasta()" or if read a sequence directly
> from a file I am using "scan()": eg:
>
> train500 = read.csv("toClassify500_1.csv",header=TRUE) # reading the
> training set
> modelforSVM <- ksvm(Class ~ ., data = train500, kernel = "rbfdot", kpar =
> "automatic", C = 60, cross = 3, prob.model = TRUE)
> Now if I do:
> tindex =sample(1:dim(train500)[1], 1)
> testSeq=train500[tindex,]
> predict(modelforSVM, testSeq);
> It works great.
>
> BUT if I do:
>
> my.file=file("chr4_seqs.fasta", open="r")
> chr4Seq = scan(my.file,list("",""),nlines=2) # read the data from a fasta
> file using scan()
>
>     seqId = chr4Seq[[1]];
>     testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]]))))
>  # the s2c function just converts the "STRING" to char vector "S" "T" "R"
> "I" "N" "G"
>
>
> predict(modelforSVM, testSeq);
> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :   contrasts
> can be applied only to factors with 2 or more levels
> -------------------------
> If I apply factor() to testSeq, it still doesn't work : eg:
>
> testSeq=data.frame(lapply(testSeq,factor))
> I still get Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
>   contrasts can be applied only to factors with 2 or more levels
>
> Another thing I tried was reading the fasta file using the readFasta()
> function and taking a sample input from the training set itself:
>
> data500_1_fasta = read.fasta("toClassify500.fasta") # read a fasta file via
> the seqinr package
> data500_1_seq = t(getSequence(data500_1_fasta)) # get the sequences from
> it, 256 sequences, first 128 are +, next 128 are -
> data500_1_df = as.data.frame(data500_1_seq) #make a data frame from it
> class = append(rep("+",times=128),rep("-",times=128)) # add the class
> column to it
> data500_1_df = cbind(Class=class,data500_1_df)
> data500_1_df = data.frame(lapply(data500_1_df,factor)) #finally apply the
> factor() on the data frame
>
> #Now train and get the model
>
> modelforSVM <- ksvm(Class ~ ., data = data500_1_df, kernel = "rbfdot", kpar
> = "automatic", C = 60, cross = 3, prob.model = TRUE)
>
> and finally:
> tindex =sample(1:dim(data500_1_df)[1], 1)
> testSeq=data500_1_df[tindex,]
>
> predict(modelforSVM, testSeq);
>
> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :   contrasts
> can be applied only to factors with 2 or more levels
>
> I am very confused at this point. What am I doing wrong? How do I use the
> factor() function properly so that I don't get this error? Am I in the right
> direction at all?
>
> Thanks in anticipation of your help.
>
> -vishal
>
>
>
>


-- 
Vishal Thapar, Ph.D.
Post Doctoral Researcher
Cold Spring Harbor Lab
Williams Bldg

1 Bungtown Road
Cold Spring Harbor, NY - 11724


More information about the R-help mailing list