[R] help in SVM

Thu Jun 24 19:43:42 CEST 2010

Hi,

On Thu, Jun 24, 2010 at 1:22 PM, Changbin Du <changbind at gmail.com> wrote:
> HI, GUYS,
>
> I used the following codes to run SVM and get prediction on new data set hh.
>
>  dim(all_h)
> [1] 2034   24
>  dim(hh)    # it contains all the variables besides the variables in all_h
> data set.
> [1] 640 415

If I understand you correctly, this is wrong.

You are supposed to hold out *observations* (rows) when doing
training/testing, not variables/predictors/features (cols).

Let's assume that e1071::svm doesn't do anything fancy with matching
column names between training/testing, then to put this simply: the
number of columns (features per observation) you are using in training
should be the same number of columns you have in your test set.

-steve

> require(e1071)
>
> svm.tune<-tune(svm, as.factor(out) ~ ., data=all_h,
> ranges=list(gamma=2^(-5:5), cost=2^(-5:5)))# find the best parameters.
>
> bestg<-svm.tune$best.parameters[[1]]
> bestc<-svm.tune$best.parameters[[2]]
>
> svm.fit<-svm(as.factor(out) ~ ., data=all_h, method="C-classification",
> kernel="radial", probability = TRUE, cost=bestc, gamma=bestg, cross=10) #
> model fitting
>
> svm.pred<-predict(svm.fit, hh, decision.values = TRUE, probability = TRUE) #
> find the probability.
> *
> Error in matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames =
> list(rowns,  :
>  invalid 'ncol' value (too large or NA)*
>
>
>> head(all_h)
>       DD    HK HQ      IL      LP          NE          NP
> TA          TP            WA      WC
> 1 0.00543  0  0 0.00815 0.00272 0.00543 0.00000 0.00000 0.00000 0.00000  0
> 3 0.00000  0  0 0.00890 0.00890 0.00712 0.00534 0.00000 0.00890 0.00178  0
> 4 0.00448  0  0 0.00448 0.00299 0.00448 0.00149 0.00299 0.00000 0.00149  0
> 5 0.00312  0  0 0.00467 0.00467 0.00000 0.00156 0.00467 0.00312 0.00467  0
> 6 0.00587  0  0 0.02053 0.00587 0.00000 0.00293 0.00587 0.00293 0.00000  0
> 7 0.00000  0  0 0.02422 0.00346 0.00000 0.00346 0.00346 0.00000 0.00346  0
>       WD      WG      WN              YW        acid_per
> base_per  charge_per
> 1 0.00000 0.00000 0.00000 0.00000 0.14402174 0.12228261 0.019021739
> 3 0.00178 0.00178 0.00534 0.00178 0.12277580 0.09252669 0.016014235
> 4 0.00149 0.00448 0.00448 0.00000 0.16591928 0.11509716 0.022421525
> 5 0.00000 0.00156 0.00000 0.00156 0.13084112 0.10903427 0.009345794
> 6 0.00293 0.00000 0.00000 0.00000 0.07038123 0.08797654 0.002932551
> 7 0.00000 0.00346 0.00000 0.00346 0.05536332 0.08650519 0.010380623
>  hydrophob_per polar_per num_cell num_genes position             out
> 1     0.3804348 0.1929348        1         4        1   0
> 3     0.3540925 0.2508897        1         4        3   0
> 4     0.3393124 0.2032885        1         4        4   1
> 5     0.3753894 0.2305296        2         7        1   0
> 6     0.4868035 0.1964809        2         7        2   0
> 7     0.4878893 0.1522491        2         7        3   0
>
>> quantile(hh$HK)
>     0%     25%     50%     75%    100%
> 0.00000 0.00000 0.00000 0.00000 0.02703
>> quantile(hh$HQ)
>   0%   25%   50%   75%  100%
> 0.000 0.000 0.000 0.000 0.025
>> quantile(hh$WC)
>     0%     25%     50%     75%    100%
> 0.00000 0.00000 0.00000 0.00000 0.01266
>
> Can someone give some suggestions?
>
> Thanks!
>
>
>
>
>
> --
> Sincerely,
> Changbin
> --
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact