[BioC] about label permutation test for binary classification

Joern Toedling toedling at ebi.ac.uk
Thu Sep 13 16:08:47 CEST 2007


Hello,

I am a bit puzzled about what you actually want to ask.

James Anderson wrote:
> For binary classification problem in microarray, if you do some random subsampling classification (every time split data into 80% training and 20% test with stratification (perserving the ratio in each class), repeat many times). When you get some results, one thing you would normally look at is how significantly different is your results from what you are going to get by chance, that's why people do label permutation test. My question is that: Are the final results of label permutation test for accuracy equal to the proportion of the large class (say there are 80 normal vs. 20 disease, is the mean accuracy of label permutation test equal to 80/(80+20) as long as you repeat enough times? Is this classifier independent? 
>   

The mean accuracy of your classifier after label permutation, in a
cross-validation setting presumably, depends very much on your
classifier. What you should contrast it to is the accuracy of the naive
classifier "assign every sample to the larger class", 80% in your case.
A good reason for label permutation in your case is that you want to
assess the classifier's generalizability, because one can always
construct a classifier that has an accuracy of 100% on the training
data, but performs badly on independent test data. That is one reason
why people do label permutation with classification because the
classifier's mean accuracy in a cross-validation setting gives a better
estimate of the classifier's accuracy on test data. (You have to make
sure that you do not use any aspect of the set-aside training data for
training the classifier, though.) An even better estimate for your
classifier's performance, however, would be its accuracy on a completely
independent test data set. Cross-validation on your training data could
then be used to select parameters of your classifier, if needed.

Hope this helps.
Regards,
Joern

> Thanks a lot!
>
> James
>
>        
> ---------------------------------
> Building a website is a piece of cake. 
>   

well, classification sometimes isn't.



More information about the Bioconductor mailing list