[BioC] Re: Classification question(Tom R. Fahland)

Fri Apr 30 18:35:32 CEST 2004

Sorry I don't quite understand why you are treating a dose response as a
classification problem. 

what are you trying to achieve?

Stephen 

-----Original Message-----
From: Tarca Adi Laurentiu
To: bioconductor at stat.math.ethz.ch
Sent: 30/04/04 15:04
Subject: [BioC] Re: Classification question(Tom R. Fahland)

 >>Date: Thu, 1 Apr 2004 15:47:48 -0800
 >>From: "Tom R. Fahland" <tfahland at genomatica.com>
 >>Subject: [BioC] Classification question

 >>All

 >>I had a quick question about how you might best solve a
classification
 >>problem. I have some ideas, but wanted to run it by the group to see
their
 >>thoughts. I have animal data containing different doses of a
substance 
and also
 >>have multiple time points for each dose (with replicates). I am 
interested in
 >>classifying the samples based on dose amount. I am experimenting with

non-linear
 >>techniques like neural nets, etc. Now this problem is striaght
forward 
if you have only one
 >>time point per dose, just group similar doses together and train the
 >>network. But its alittle more tricky with multiple time points. What
do
 >>you think is the best way to fully utilize all the data for dosage
 >>classification. How would you use/incorporate the mulitple time
points?

 >>Thanks
 >>Tom

Hi Tom,
If I understand well, there are C levels of dose (predefined classes) in

which your hybridizations fall.
Then, perhaps you consider only a reduced set of (most regulated) say Ng

genes (but always the same) and want to use their (normalized) M values
at 
the Nt different time points to predict the class.
So your samples my be viewed as NgxNt matrices of features you dispose
to 
perform the classification and your problem is mostly how to reduce the 
numbers of features.

There are mainly two types of dimensionality reduction methods: feature 
extraction and feature selection.
You may perform feature extraction with for e.g. Principal Component 
Analysis so you may reduce the Nt dimensions to lets say only 2 (the
first 
two principal components) of your data, but you will still have Ngx2 
features to input into your classifier.
With feature selection you may select among all NgxNt those feature that

are the most "relevant" for classification without altering their
meaning 
(as PCA does).
I may provide you with a matalb implementation of a feature selector 
algorithm which uses as relevance measure the n-fold cross-validated 
accuracy of a nearest neighbor classifier and as combinatorial
optimization 
algorithm (maximizing the relevance) a sequential method like sequential

forward selection or "plus l take away r". As the number of samples you 
have is reduced I believe it will work fine for Ng=20xNt=10 features, or

even more.
Once the features are selected you may use them with any supervised 
classifier.

Laurentiu

----------------------------------------------
Dr. Laurentiu Adi Tarca
Post Doc. in Bioinformatics
Forest Biology Research Center
C-E-Marchand Bld, 3113
Laval University
Quebec, (Qc)
G1K-7P4
Tel: 656-2131 ext. 4509
e-mail: ltarca at rsvs.ulaval.ca

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager (it.support at wibr.ucl.ac.uk). All files are scanned for viruses.