[R] Strange question/result about SVM

Mon Sep 14 20:14:48 CEST 2009

Ravi,

If you don't like my questions, simply don't answer them.

--
N

On 9/14/09 10:12 AM, Ravi Varadhan wrote:
> Noah,
>
> It may be just me - but how does "any" of your questions on prediction
> modeling relate to R?
>
> It seems to me that you have been getting a lot of "free" consulting from
> this forum that is supposed to be a forum for help on R-related issues.
>
> Ravi.
>
> ----------------------------------------------------------------------------
> -------
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvaradhan at jhmi.edu
>
> Webpage:
> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> tml
>
>
>
> ----------------------------------------------------------------------------
> --------
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Noah Silverman
> Sent: Monday, September 14, 2009 1:00 PM
> To: r help
> Subject: [R] Strange question/result about SVM
>
> Hello,
>
> I have a very unusual situation with an SVM and wanted to get the
> group's opinion.
>
> We developed an experiment where we train the SVM with one set of data
> (train data) and then test with a completely independent set of data
> (test data).  The results were VERY good.
>
> I found and error in how we generate one of or training variables.  We
> discovered that it was indirectly influenced by future events.  Clearly
> that needed to be fixed.  Fixing the variable immediately changed our
> results from good to terrible. (Not a surprise since the erroneous
> variable had future influence.)
>
> A friend, who knows NOTHING of statistics or math, innocently asked,
> "Why don't you just keep that variable since it seems to make your
> results so much better."  The idea, while naive, led me to thinking.  We
> can include future data in the training set, since it occurred in the
> past, but what to do with the test data from today?  As a test, I tried
> simply setting the variable to the average of the value in the training
> data.  The results were great!  Now since the data is scaled, and we set
> the variable to the same value (constant from average of training data.)
> it scaled to 0.  Still, great results.
>
> To summarize:
>
> Bad var in training + Bad var in testing = great results
> Good var in training + Good var in testing = bad results
> Bad var in training + Constant in testing = great results.
>
>
> I'm not an expert with the internals of the SVM, but clearly the bad
> variable is setting some kind of threshhold or intercept when defining
> the model.  Can someone help me figure out why/how this is working?
>
> Thanks!
>
> --
> N
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>