[R] p-values for classification
Arne.Muller at sanofi-aventis.com
Fri Jul 1 12:14:20 CEST 2005
I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points.
I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for each classifier. I was thinking about shuffling/bootstrap the lables of the test set, classify them and calculating the p-value from the obtained normal distributed random fdr and sensitifity.
The problem is that it's rather slow when running many rounds of shuffling/classification (I'd like to do this for many classifiers and parameter combinations). In addition classification of the 50 test data points with shuffled lables realistically produces only a very limited number of possible fdr's and sensitivities, and I'm wondering if I can realy believe these values to be normal.
Basically I'm looking for a way to calculate the p-values analytically. I'd be happy for any suggestions, web-addresses or references.
More information about the R-help