[R] Help with SVM package Kernlab

David Winsemius dwinsemius at comcast.net
Sat Dec 26 15:10:34 CET 2009


Perhaps my response is not showing up on the r-help list due to the  
large number of recipients generated by the reply-all option  
triggering some sort of spam filter. So I am trimming them.

On Dec 26, 2009, at 3:53 AM, Vishal Thapar wrote:

> Hi All,
>
> Thank you for your replies so far. I was hoping I could get some  
> more input from you on this issue. It seems to me that I have hit a  
> dead end here and would really appreciate some feedback. I have  
> followed all the suggestions you have mentioned but they still this  
> is stuck. Earlier I thought that it was a "factor" issue but now  
> even that is not the error. Here is the script and the error. Thanks  
> for your help. I have attached the sample test file as well as the  
> training file in case you would like to run it locally.
> ---------------------------------
> library(seqinr)
> library("kernlab")
>
> ### Reading in the data
> mars500_1_fasta = read.fasta("toClassify500_1.fasta")
> mars500_1_seq = t(getSequence(mars500_1_fasta)) # get the sequences  
> from the fasta object
> mars500_1_df = as.data.frame(mars500_1_seq,stringsAsFactors=FALSE) #  
> convert it to a Data Frame
> class = append(rep("+",times=128),rep("-",times=128)) # add the  
> Class field to the data frame for classification
> mars500_1_df = cbind(Class=class,mars500_1_df)
> mars500_1_df = data.frame(lapply(mars500_1_df,factor)) #Finally  
> apply the factor() function
> #####
> ##### Call the ksvm() function to create a model
> mars500_1 <- ksvm(Class ~ ., data = mars500_1_df, kernel = "rbfdot",  
> kpar = "automatic", C = 60, cross = 3, prob.model = TRUE)

 > str(mars500_1_df)
'data.frame':	256 obs. of  501 variables:
All of which are factors with 4 levels

> testSeq_fa=read.fasta("temp1.fasta")
> testSeq_seq=t(getSequence(testSeq_fa))
> testSeq_df=as.data.frame(testSeq_seq,stringsAsFactors=FALSE)
> testSeq_df = cbind(Class="-",testSeq_df)
> testSeq_df = data.frame(lapply(testSeq_df,factor))
 > str(testSeq_df)
'data.frame':	20 obs. of  501 variables:

$ V9   : Factor w/ 3 levels "a","c","t": 2 1 2 1 3 2 3 2 3 1 ...
$ V9   : Factor w/ 3 levels "a","c","t": 2 1 2 1 3 2 3 2 3 1 ...
$ V26  : Factor w/ 3 levels "a","g","t": 2 1 1 1 1 3 1 3 1 3 ...
...and about 10 more...

So I think you were closer but not quite there yet.
 > for(i in 11:501){if (length(levels(testSeq_df[,i])) == 3)
                levels(testSeq_df[,i])<- c(a="a",g="g",c="c",c="t")}

 > predict(mars500_1,testSeq_df)
[1] - - - - + - + - - - - + + + - - - - - +
Levels: - +

YES, it WAS (and still is) a "factor issue".

>
> testSeq_fa=read.fasta("temp1.fasta")
> testSeq_seq=t(getSequence(testSeq_fa))
> testSeq_df=as.data.frame(testSeq_seq,stringsAsFactors=FALSE)
> testSeq_df = cbind(Class="-",testSeq_df)
> testSeq_df = data.frame(lapply(testSeq_df,factor))
> predict(mars500_1,testSeq_df)
>
> Error in .local(object, ...) : test vector does not match model !
>
> Thanks in advance.
>
> Sincerely,
>
> Vishal
>
>
> On Fri, Dec 25, 2009 at 8:10 AM, Vishal Thapar  
> <vishalthapar at gmail.com> wrote:
> Hi,
>
> I seem to have made some headway on this problem but its still not  
> solved. It seems like this is a "factor" issue. When I read my  
> training set, I read it with read.csv() which converts each of the  
> columns as "factors". From this if I take a single row as my
<snipped>
> Williams Bldg
>
> 1 Bungtown Road
> Cold Spring Harbor, NY - 11724
>
> <toClassify500_1.fasta><temp1.fasta>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list