[BioC] RandomForest, supervised machine learning and uncertainty

Moshe Olshansky m_olshansky at yahoo.com
Wed Dec 8 23:48:00 CET 2010


Hi January,

If the situation is as you describe it, you do not need the 3rd class: stay with the two original classes and when a new case arrives, if at least 60% (or some other threshold) of votes are for class 1, make it class 1, if at least 60% (not necessarily equal to previous one) of votes are for class 2 make it class 2 and otherwise make it undefined.

Best regards,
Moshe.

--- On Thu, 9/12/10, January Weiner <january.weiner at mpiib-berlin.mpg.de> wrote:

> From: January Weiner <january.weiner at mpiib-berlin.mpg.de>
> Subject: Re: [BioC] RandomForest, supervised machine learning and uncertainty
> To: "BioC" <bioconductor at stat.math.ethz.ch>
> Received: Thursday, 9 December, 2010, 12:01 AM
> Thank you, Vincent, for the answer.
> 
> > task, but if I read you correctly you are addressing
> the extension of
> > the decision task from two classes to two classes plus
> "doubt".  This
> 
> Yes; although I do have more than two classes, and I would
> like to
> stick to random forests. Say, extend the RF decision task
> from N
> classes to N + 1 classes. The problem has been well
> described in the
> discussion on "safety threshold" in the Ripley book.
> 
> The simple solution is to define a "doubt function" d on
> the votes
> matrix from the RF such as the one that I have mentioned,
> and then
> plot the size of "doubt class" and the error rate in the
> remaining
> classes against d. That would help making a decision or
> would actually
> count as a result for my study.
> 
> 
> @Sean Davis:
> 
> > I'll just add here that when thinking about biomarker
> selection and clinical prediction,
> > one must be aware of the often imbalanced costs (to
> the patient) of misclassification
> > (which could include the "unclassified" cases),
> depending on the actual details of
> > the clinical scenario.
> 
> This is precisely why I would like to consider the "doubt
> class". The
> costs of having an unclassified result are definitely
> different (and
> most likely lower) than the costs of false negative.
> 
> Cheers,
> j.
> 
> 
> 
> > is discussed at some length in Ripley's "Pattern
> Recognition and
> > Neural Networks" book; see the comments on the
> "error-reject" curve on
> > p20 and on "safety threshold" concept on p22.
> >
> > The MLInterfaces vignette has an application (that, as
> written, turns
> > out to be nugatory) just at the end of the vignette --
> the doubt
> > interval is too narrow to capture any classification
> for the data in
> > use.  If you change the code to
> >
> > douPred[smallDou(0.35, 0.65)] <- "doubt"
> >
> > one prediction is converted to "doubt".  This issue
> deserves more attention.
> >
> >
> >>
> >> Best regards,
> >>
> >> j.
> >>
> >> --
> >> -------- Dr. January Weiner 3
> --------------------------------------
> >> Max Planck Institute for Infection Biology
> >> Charitéplatz 1
> >> D-10117 Berlin, Germany
> >> Web   : www.mpiib-berlin.mpg.de
> >> Tel     : +49-30-28460514
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> 
> 
> 
> -- 
> -------- Dr. January Weiner 3
> --------------------------------------
> Max Planck Institute for Infection Biology
> Charitéplatz 1
> D-10117 Berlin, Germany
> Web   : www.mpiib-berlin.mpg.de
> Tel     : +49-30-28460514
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list