[BioC] package of predicting a continuous variable from more than one continuous predictor variables
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Sep 9 16:45:32 CEST 2009
Hi,
On Sep 9, 2009, at 10:38 AM, shirley zhang wrote:
> Hi Steve,
>
> Thanks for your explanation and suggestions. I don't know SVM can also
> be used for regression since I only used it for classification.
Yeah, no problem. It's pretty straightforward to wire up an SVM for
regression -- you'll have run it a few times with different values of
"epsilon" (like you would for the the C (or nu) in svm-classification).
If you're interested in some details/theory, here's a "brief tutorial"
on support vector regression by Alex Smola and Bernhard Scholkopf:
http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf
Let us know if you need help (but maybe R-help might be more
appropriate?).
> I will try those methods you suggested. Do you have any experience
> with CART?
Nope, I've never used CART before, sorry.
-steve
>
> Thanks again,
> Shirley
>
> On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi Shirley,
>>
>> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote:
>>
>>> Thanks Steve.
>>>
>>> Sorry that I did not make myself clear. I am trying to build a
>>> biomarker from gene expression microarray data. What I am doing is
>>> similar to the weighted-voting algorithm or SVM. But the
>>> difference is
>>> that the outcome is a continuous variable instead of a categorical
>>> variable. It is a regression problem, but I want to know which
>>> package is best for this purpose? How about CART?
>>
>> I don't know if there's such thing as "best"(?) What yard stick
>> would you
>> use to measure that?
>>
>> For instance, you mention "it" is similar to an svm (how?), but
>> SVM's can
>> also be used for regression, not just classification (doable from
>> both e1071
>> and kernlab). How about going that route? As usual, interpretation
>> of the
>> model might be challenging, though (which might be why you're
>> avoiding it
>> for biomarker discovery?)
>>
>> You also mention weighted-voting:
>>
>> * how about boosted regression models?
>> http://cran.r-project.org/web/packages/gbm/index.html
>>
>> * Also related to boosting: bagging & randomForests (both can be
>> used for
>> regression):
>> http://cran.r-project.org/web/packages/randomForest/index.html
>> http://cran.r-project.org/web/packages/ipred/index.html
>>
>> I think boosting/bagging/random-forests tend to lead to more
>> interpretable
>> models, so maybe that's better for you?
>>
>> There are also several penalized regression packages (also good for
>> interpretability) for instance glmnet is great:
>> http://cran.r-project.org/web/packages/glmnet/index.html
>>
>> Maybe you have some info about the grouping of your predictors? Try
>> grouped
>> lasso:
>> http://cran.r-project.org/web/packages/grplasso/index.html
>>
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list