(Note that I'm taking this back to the mailing list in case others are 
interested.)

Orthogonal.  One strategy is

Randomly separate the data into training and test (using whatever 
proportions you think are appropriate for your size dataset).
On the training set, use the combination of polr+step to find an optimal 
model.
Repeat this lots of times.
Collect data on how often each predictor gets selected in the optimal 
model (which depends on the exact composition of the training set).
Also collect data on how well the trained model fits its test data. 
(Tricky with an ordinal outcome.  Key question is how to weight the 
penalties for prediction errors that are off by one ordinal category as 
opposed to two or more categories.  Might want something like a weighted 
Cohen's kappa.)

Finally, you need to summarize the cross-validation results to decide 
which predictors are the best.   With only five possible predictors, the 
idea might be just to combine the out-of-band prediction results for 
each of the 2^5 possible model structures (or whatever subset actually 
gets selected some number of times.)  The other possibility is to claim 
that every time a predictor gets selected in one of the optimal models, 
then it gets credit (or blame) for all of the predictions those models 
makes.  Intuitively, I prefer the first of these alternatives.

     Kevin

On 6/24/2011 12:37 PM, Tim Triche, Jr. wrote:
> you prefer AIC to crossvalidation for model selection?  or feel 
> they're orthogonal?
>
> thanks for the tip about polr, I had a vague recollection of it, but 
> this is the first time I actually read the man page.  appreciate your 
> taking the time to send it.
>
> --t
>
>
> On Fri, Jun 24, 2011 at 10:26 AM, Kevin R. Coombes 
> <kevin.r.coombes@gmail.com <mailto:kevin.r.coombes@gmail.com>> wrote:
>
>     The standard MASS package includes the "polr" function to perform
>     ordinal regression.  After running polr to fit the base model with
>     all parameters, you can pass the results throught the "step"
>     function to use AIC to select the best set of predictors.
>
>        Kevin
>
>
>     On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote:
>
>         You have an ordinal response, so you might consider an ordered
>         probit model
>         with interaction terms and a penalized likelihood fit, and
>         determine the
>         best penalty by cross-validation.  I don't recall whether CMA
>         supports
>         ordered probit models, but it's probably the best approach,
>         and you could
>         just brute-force it -- you've only got 120 different models to
>         fit under
>         this scheme.  At the very least, CMA would generate the
>         cross-validation
>         sets for you.
>
>         You might also want to consider recursively fitting a shrunken
>         LDA model
>         (diseased/healthy, moderate/severe) and see how that compares
>         to an ordinal
>         model.  Regardless, cross-validation is the obvious answer to
>         how to pick
>         one.
>
>         Hope this helps,
>         -t
>
>         On Fri, Jun 24, 2011 at 8:24 AM, David
>         martin<vilanew@gmail.com <mailto:vilanew@gmail.com>>  wrote:
>
>             thanks.
>             Is not binary since i have three categories and 5 genes. I
>             have tried LDA
>             and stepclass
>
>             #LDR stepwise
>             disc<-stepclass(Group~ ., data =dataf, method =
>             "lda",improvement = 0.001)
>
>             where group contains my three categories
>             ("healthy","moderate disease",
>             "severe disease") and dataf the pcr values for my 5 genes.
>
>             The problem i have is that stepwise generates a different
>             signature each
>             time (as it randomly picks up a gene to start with)? This
>             is ok for me but
>             how many times do you need to run stepclass so that you
>             found your mopst
>             probable genes that classify your groups , Do i need to do
>             a loop for
>             stepclass ???
>
>             thanks
>
>
>
>             On 06/24/2011 05:17 PM, Kevin R. Coombes wrote:
>
>                 .. and probably should ...
>
>                 For a binary classification with only a few
>                 predictors, you can, for
>                 example, use logistic regression with some standard
>                 criterion like AIC,
>                 BIC, or Bayesian model averaging to decide which
>                 predictors should be
>                 retained.
>
>                 Kevin
>
>                 On 6/23/2011 6:10 PM, Moshe Olshansky wrote:
>
>                     If you have just 5 genes and a decent number of
>                     samples you can use
>                     any of
>                     the "conventional" (i.e. not high throughput)
>                     methods like LDA, trees,
>                     Random Forest, SVM, etc.
>
>                      I will have a look at both packages. It's pcr
>                     data by the way
>
>                         thanks
>
>                         On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote:
>
>                             or CMA, which is perhaps a more systematic
>                             approach for classification.
>                             (the package name stands for
>                             Classification of MicroArrays) Very well
>                             thought out.
>
>
>                             On Thu, Jun 23, 2011 at 8:02 AM, Sean
>                             Davis<sdavis2@mail.nih.gov
>                             <mailto:sdavis2@mail.nih.gov>>
>                             wrote:
>
>                              On Thu, Jun 23, 2011 at 10:58 AM, David
>
>                                 martin<vilanew@gmail.com
>                                 <mailto:vilanew@gmail.com>>
>                                 wrote:
>
>                                     Hi,
>                                     I have 5 genes of interest. I
>                                     would like to know which
>                                     combination(s)
>                                     of
>                                     genes gives the best disease
>                                     separation. Which test could i use
>                                     in my
>                                     training set to see which
>                                     combination is the best
>                                     classificer between
>                                     my
>                                     disease and my healthy population.
>
>                                     Thanks for any comment or test
>                                     that could be useful to answer that
>
>                                 question.
>
>                                 Check out the MLInterfaces package. It
>                                 should give you some ideas on
>                                 where to start.
>
>                                 Sean
>
>                                 ______________________________**_________________
>                                 Bioconductor mailing list
>                                 Bioconductor@r-project.org
>                                 <mailto:Bioconductor@r-project.org>
>                                 https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                                 Search the archives:
>                                 http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>                              ______________________________**_________________
>
>                         Bioconductor mailing list
>                         Bioconductor@r-project.org
>                         <mailto:Bioconductor@r-project.org>
>                         https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                         Search the archives:
>                         http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>                     ______________________________**______________________________**
>                     __________
>                     The information in this email is confidential and
>                     intend...{{dropped:4}}
>
>                     ______________________________**_________________
>                     Bioconductor mailing list
>                     Bioconductor@r-project.org
>                     <mailto:Bioconductor@r-project.org>
>                     https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                     Search the archives:
>                     http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>                 ______________________________**_________________
>                 Bioconductor mailing list
>                 Bioconductor@r-project.org
>                 <mailto:Bioconductor@r-project.org>
>                 https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                 Search the archives:
>                 http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>             ______________________________**_________________
>             Bioconductor mailing list
>             Bioconductor@r-project.org <mailto:Bioconductor@r-project.org>
>             https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>
>
>             Search the archives: http://news.gmane.org/gmane.**
>             science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>
>
> -- 
> When you emerge in a few years, you can ask someone what you missed, 
> and you'll find it can be summed up in a few minutes.
>
> Derek Sivers <http://sivers.org/berklee>
>

	[[alternative HTML version deleted]]

