[R] Levels in new data fed to SVM
Claus O'Rourke
claus.orourke at gmail.com
Thu Jan 10 14:57:52 CET 2013
Thanks for clarifying!
On Thu, Jan 10, 2013 at 12:47 PM, Uwe Ligges
<ligges at statistik.tu-dortmund.de> wrote:
>
>
> On 08.01.2013 21:14, Claus O'Rourke wrote:
>>
>> Hi all,
>> I've encountered an issue using svm (e1071) in the specific case of
>> supplying new data which may not have the full range of levels that
>> were present in the training data.
>>
>> I've constructed this really primitive example to illustrate the point:
>>
>>> library(e1071)
>>> training.data <- data.frame(x = c("yellow","red","yellow","red"), a =
>>> c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c"))
>>> my.model <- svm(x ~ .,data=training.data)
>>> test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b =
>>> c("a", "b"))
>>> predict(my.model,test.data)
>>
>> Error in predict.svm(my.model, test.data) :
>> test data does not match model !
>>>
>>>
>>> levels(test.data$b) <- levels(training.data$b)
>>> predict(my.model,test.data)
>>
>> 1 2
>> yellow red
>> Levels: red yellow
>>
>> In the first case test.data$b does not have the level "c" and this
>> results in the input data being rejected. I've debugged this down to
>> the point of model matrix creation in the SVM R code. Once I fill up
>> the levels in the test data with the levels from the original data,
>> then there is no problem at all.
>>
>> Assuming my test data has to come from another source where the number
>> of category levels seen might not always be as large as those for the
>> original training data, is there a better way I should be handling
>> this?
>
>
>
> You have to tell the factor about the possible levels, it does not
> necessarily contain examples.
> That means:
>
> levels(test.data$b) <- C("a", "b", "c")
> predict(my.model,test.data)
>
> will help.
>
> Best,
> Uwe Ligges
>
>
>
>> Thanks
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list