[BioC] Re: KNN, SVM, and randomForest - How to predict testing set without known categories (affy data)

Kasper Daniel Hansen k.hansen at biostat.ku.dk
Wed Jul 28 20:37:36 CEST 2004


Adaikalavan Ramasamy <ramasamy at cancer.org.uk> writes:

> I do not know much about exprSet (please correct me if I am wrong) but I
> think and treat exprSet as matrix. Indeed in my previous message, I was
> writing in the context of matrix.
>
> data(affybatch.example)
> a <- rma(affybatch.example)
> m <- exprs(a)
>
> Then I work with 'm' which may or may not be what you want. 
>
> If you want to force a matrix to exprSet, the examples in
> help("exprSet") might be helpful.

an exprSet is a matrix of expression values coupled with a dataframe
of covariates. If you (original poster) look at the aforementioned
article, you will se that they use the original exprset (lets call it
Edata) in the following way:
  Xdata <- t(exprs(Edata))
  Ydata <- pData(Edata)["y-values"]
So you do not really need the exprset object, as it is only used to
get the matrix of expression values and the dataframe of classes. Now,
given that you have a fit (which you have constructed using a train
data set with known classes), you predict the classes in something
like
  predict(fit, newdata=Xdata.test)

I suggest looking at the code and try to separate the different
components.

/Kasper

> Regards, Adai.
>
>
> On Wed, 2004-07-28 at 14:09, Liu, Xin wrote:
>> Thanks Tom, Sean, Xavier for the reply, and especially Adai!
>> However I still have a problem. To put the microarray data into these supervised clustering, the expreSet need to be built. To build expreSet, you need to give the class of every sample. So when I predict samples with unknown classes, how to put them into the expreSet? Thank you!
>> 
>> Xin
>> 
>> 
>> 
>> -----Original Message-----
>> From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk]
>> Sent: 28 July 2004 13:00
>> To: Liu, Xin
>> Cc: Tom R. Fahland; BioConductor mailing list
>> Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict
>> testwithout known categories
>> 
>> 
>> If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and
>> algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one
>> is the better algorithm ? So you use tests set with known classes to do
>> this. You can do this by breaking your learning set (samples with know
>> classes) into training and test set. Look up "cross validation".
>> 
>> Some example of built in cross validation
>> * knn.cv() is a leave one out cross-validation of knn()
>> * svm() in library(e1071) has an argument named 'cross' for cross
>> validation
>> In practice, I prefer to write my own wrapper for cross-validation to
>> ensure that sampling method is the same across all algorithms.
>> 
>> Once you have determined the best algorithm and features, you then use
>> predict() to predict samples with unknown classes.
>> 
>> Regards, Adai.
>> 
>> 
>> 
>> On Wed, 2004-07-28 at 09:18, Liu, Xin wrote:
>> > In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, which require the train WITH known catagories and the test WITH known catagories. However, by definition, in supervised learning you always train (with known
>> > catagories), then predict the test WITHOUT known catagories. I wonder how to implement this. Thank you!
>> > 
>> > Xin
>> > 
>> > 
>> > 
>> > 
>> > 
>> > -----Original Message-----
>> > From: Tom R. Fahland [mailto:tfahland at genomatica.com]
>> > Sent: 27 July 2004 18:48
>> > To: Liu, Xin; bioconductor at stat.math.ethz.ch
>> > Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples
>> > without category 
>> > 
>> > 
>> > By definition, in supervised learning you always train (with known
>> > catagories), then run your unbiased data through for prediction. Both CV
>> > and train/test partitions are good for choosing parameters and
>> > optimizing the algorithms. I have just completed a study predicting dose
>> > expsoure with good reasults using different algorithms. 
>> > Tom
>> > 
>> > -----Original Message-----
>> > From: bioconductor-bounces at stat.math.ethz.ch
>> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Liu, Xin
>> > Sent: Tuesday, July 27, 2004 07:39
>> > To: bioconductor at stat.math.ethz.ch
>> > Subject: [BioC] KNN, SVM,and randomForest - How to predict samples
>> > without category 
>> > 
>> > 
>> > Dear all,
>> > 
>> > Supervised clusterings (KNN, SVM, and randomForest) use test sample set
>> > and train sample set to do prediction. To create the expreSet, the
>> > category is needed for each sample. However sometimes we need to predict
>> > sample without its category. Anybody has some clue to do this? Thank you
>> > very much!
>> > 
>> > Best regards,
>> > Xin LIU
>> > 
>> > 
>> > 
>> > This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}}
>> > 
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> > 
>> 
>> 
>> 
>> 
>> 
>> This e-mail is from ArraGen Ltd
>> 
>> The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. 
>> 
>> Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. 
>> 
>> If you have received the e-mail in error please notify helpdesk at arragen.com or telephone +44 28 38 363841 and delete the e-mail from your system.
>> 
>> E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient.
>> 
>> Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free.
>> 
>> ArraGen Ltd. Registration Number NI 43067
>> Registered Address :  Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD
>> 
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>

-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen



More information about the Bioconductor mailing list