[BioC] KNN, SVM, and randomForest - How to predict testing set without known categories (affy data)

Wed Jul 28 15:09:41 CEST 2004

Thanks Tom, Sean, Xavier for the reply, and especially Adai!
However I still have a problem. To put the microarray data into these supervised clustering, the expreSet need to be built. To build expreSet, you need to give the class of every sample. So when I predict samples with unknown classes, how to put them into the expreSet? Thank you!

Xin

-----Original Message-----
From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk]
Sent: 28 July 2004 13:00
To: Liu, Xin
Cc: Tom R. Fahland; BioConductor mailing list
Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict
testwithout known categories

If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and
algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one
is the better algorithm ? So you use tests set with known classes to do
this. You can do this by breaking your learning set (samples with know
classes) into training and test set. Look up "cross validation".

Some example of built in cross validation
* knn.cv() is a leave one out cross-validation of knn()
* svm() in library(e1071) has an argument named 'cross' for cross
validation
In practice, I prefer to write my own wrapper for cross-validation to
ensure that sampling method is the same across all algorithms.

Once you have determined the best algorithm and features, you then use
predict() to predict samples with unknown classes.

Regards, Adai.

On Wed, 2004-07-28 at 09:18, Liu, Xin wrote:
> In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, which require the train WITH known catagories and the test WITH known catagories. However, by definition, in supervised learning you always train (with known
> catagories), then predict the test WITHOUT known catagories. I wonder how to implement this. Thank you!
> 
> Xin
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Tom R. Fahland [mailto:tfahland at genomatica.com]
> Sent: 27 July 2004 18:48
> To: Liu, Xin; bioconductor at stat.math.ethz.ch
> Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples
> without category 
> 
> 
> By definition, in supervised learning you always train (with known
> catagories), then run your unbiased data through for prediction. Both CV
> and train/test partitions are good for choosing parameters and
> optimizing the algorithms. I have just completed a study predicting dose
> expsoure with good reasults using different algorithms. 
> Tom
> 
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Liu, Xin
> Sent: Tuesday, July 27, 2004 07:39
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] KNN, SVM,and randomForest - How to predict samples
> without category 
> 
> 
> Dear all,
> 
> Supervised clusterings (KNN, SVM, and randomForest) use test sample set
> and train sample set to do prediction. To create the expreSet, the
> category is needed for each sample. However sometimes we need to predict
> sample without its category. Anybody has some clue to do this? Thank you
> very much!
> 
> Best regards,
> Xin LIU
> 
> 
> 
> This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 

This e-mail is from ArraGen Ltd\ \ The e-mail and any files ...{{dropped}}