[R] help in SVM
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Jun 24 19:43:42 CEST 2010
Hi,
On Thu, Jun 24, 2010 at 1:22 PM, Changbin Du <changbind at gmail.com> wrote:
> HI, GUYS,
>
> I used the following codes to run SVM and get prediction on new data set hh.
>
> dim(all_h)
> [1] 2034 24
> dim(hh) # it contains all the variables besides the variables in all_h
> data set.
> [1] 640 415
If I understand you correctly, this is wrong.
You are supposed to hold out *observations* (rows) when doing
training/testing, not variables/predictors/features (cols).
Let's assume that e1071::svm doesn't do anything fancy with matching
column names between training/testing, then to put this simply: the
number of columns (features per observation) you are using in training
should be the same number of columns you have in your test set.
-steve
> require(e1071)
>
> svm.tune<-tune(svm, as.factor(out) ~ ., data=all_h,
> ranges=list(gamma=2^(-5:5), cost=2^(-5:5)))# find the best parameters.
>
> bestg<-svm.tune$best.parameters[[1]]
> bestc<-svm.tune$best.parameters[[2]]
>
> svm.fit<-svm(as.factor(out) ~ ., data=all_h, method="C-classification",
> kernel="radial", probability = TRUE, cost=bestc, gamma=bestg, cross=10) #
> model fitting
>
> svm.pred<-predict(svm.fit, hh, decision.values = TRUE, probability = TRUE) #
> find the probability.
> *
> Error in matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames =
> list(rowns, :
> invalid 'ncol' value (too large or NA)*
>
>
>> head(all_h)
> DD HK HQ IL LP NE NP
> TA TP WA WC
> 1 0.00543 0 0 0.00815 0.00272 0.00543 0.00000 0.00000 0.00000 0.00000 0
> 3 0.00000 0 0 0.00890 0.00890 0.00712 0.00534 0.00000 0.00890 0.00178 0
> 4 0.00448 0 0 0.00448 0.00299 0.00448 0.00149 0.00299 0.00000 0.00149 0
> 5 0.00312 0 0 0.00467 0.00467 0.00000 0.00156 0.00467 0.00312 0.00467 0
> 6 0.00587 0 0 0.02053 0.00587 0.00000 0.00293 0.00587 0.00293 0.00000 0
> 7 0.00000 0 0 0.02422 0.00346 0.00000 0.00346 0.00346 0.00000 0.00346 0
> WD WG WN YW acid_per
> base_per charge_per
> 1 0.00000 0.00000 0.00000 0.00000 0.14402174 0.12228261 0.019021739
> 3 0.00178 0.00178 0.00534 0.00178 0.12277580 0.09252669 0.016014235
> 4 0.00149 0.00448 0.00448 0.00000 0.16591928 0.11509716 0.022421525
> 5 0.00000 0.00156 0.00000 0.00156 0.13084112 0.10903427 0.009345794
> 6 0.00293 0.00000 0.00000 0.00000 0.07038123 0.08797654 0.002932551
> 7 0.00000 0.00346 0.00000 0.00346 0.05536332 0.08650519 0.010380623
> hydrophob_per polar_per num_cell num_genes position out
> 1 0.3804348 0.1929348 1 4 1 0
> 3 0.3540925 0.2508897 1 4 3 0
> 4 0.3393124 0.2032885 1 4 4 1
> 5 0.3753894 0.2305296 2 7 1 0
> 6 0.4868035 0.1964809 2 7 2 0
> 7 0.4878893 0.1522491 2 7 3 0
>
>> quantile(hh$HK)
> 0% 25% 50% 75% 100%
> 0.00000 0.00000 0.00000 0.00000 0.02703
>> quantile(hh$HQ)
> 0% 25% 50% 75% 100%
> 0.000 0.000 0.000 0.000 0.025
>> quantile(hh$WC)
> 0% 25% 50% 75% 100%
> 0.00000 0.00000 0.00000 0.00000 0.01266
>
> Can someone give some suggestions?
>
> Thanks!
>
>
>
>
>
> --
> Sincerely,
> Changbin
> --
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list