[R] training svm
Oldrich Kruza
sixtease at gmail.com
Fri Mar 7 08:17:31 CET 2008
Hello Soumyadeep,
if you store the data in a tabular file, then I suggest using standard
text-editing tools like cut (say your file is called data.csv, fields
are separated with commas and you want to get rid of the third and
sixth column):
$ cut --complement --delimiter="," --fields=3,6 < data.csv > data_cut.csv
If you're not in an Unix environment but have perl, then you may use a
script like:
open SRC, "data.csv" or die("couldn't open source");
open DST, ">data_cut.csv" or die("couldn't open destination");
while (<SRC>) {
chomp;
@fields = split /,/; #substitute the comma for the delimiter you use
splice @fields, 2, 1; #get rid of third column (they're
zero-based, thus 2 instead of 3)
splice @fields, 5, 1; #get rid of sixth column
print DST join(",", @fields), "\n";
}
If you need to do the selection within R, then you can do it by
indexing the data structure. Suppose you have the data in a data.frame
called data. Then:
> data <- data[,-6]
> data <- data[,-3]
might do the trick (but since I'm not much of an R hacker, this is
without guarantee). I think it might be better however to do the
preprocessing before the data get into R because then you avoid
loading the columns to discard into memory.
Hope this helps
~ Oldrich
On Fri, Mar 7, 2008 at 7:55 AM, Soumyadeep nandi
<soumyadeep_nandi at yahoo.com> wrote:
> Thanks Oldrich,
> Actually I was not sure if I can remove these columns and build model.
> Thanks a lot for your kind suggestion. Could you tell me if there any
> function to remove these columns from the data matrix.
>
> With best regards,
> Soumyadeep
>
>
> Oldrich Kruza <sixtease at gmail.com> wrote:
> A rather technical workaround I see could be adding a row with a
> different value. But if a column only ever has one value, then it
> contributes nothing to the model and I see no reason why it would have
> to be kept.
> ~ Oldrich Kruza
>
> On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
> wrote:
> > What should I do if I need to train svm() with data having same value
> across
> > all rows in some columns. These must be the important features of the
> class
> > and we cant exclude these columns to build up models.
> >
> > The error I am getting is:
> > Error in predict.svm(ret, xhold) : Model is empty!
> > In addition: Warning message:
> > In svm.default(datatrain, classtrain) :
> > Variable(s) 'F112' and 'F113'.... [... truncated]
> >
> > Is there any way to overcome this problem? Any suggestions would be highly
> > helpful.
> >
> > Regards
> > Soumyadeep
> >
> >
> > ________________________________
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> > now.
>
>
>
> ________________________________
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.
More information about the R-help
mailing list