[BioC] CMA package: variable selection with limma
Renaud Gaujoux
renaud at mancala.cbio.uct.ac.za
Fri Mar 6 16:08:47 CET 2009
Hi,
I used the CMA package (v1.0.0) to evaluate the performance of a kNN
classification after a feature selection step using limma, including the
whole procedure in the cross-validation.
Before finding the CMA package I had implemented myself the same procedure
(cross-validation of feature selection) and I couldn't get the same result
as with CMA.
Looking to the CMA code I found that the feature were selected by their rank
based on the statistic computed by the function classifyTestsF from limma
(see the CMA code below). My understanding is that this function computes a
F-statistic to test the joint significance of all the coefficients in the
regression model (in this case 2: the intercept and the contrast of
interest, i.e. the difference between the two groups of samples). In my
procedure I had ranked the genes based on the p-values of the single
coefficient of interest (not the intercept), as suggested by the limma
tutorials.
Indeed the selected genes in the first step are different.
Is there a reason for also looking at the joint significance (intercept +
coeff)?
I guess the fact intercept is significantly different from zero gives us the
information that the gene is actually expressed at a certain level, making
us more confident that what is observed is not noise?
In standard regression problem (more samples than variables) I would indeed
test first the significance of the model as a whole (F-test), but in the
context of microarray I had never seen this practice before. Am I missing
something?
Which practice is recommended in this context?
Thanks
------------------
limmatest <- function(X, y, learnind, ...){
require(limma, quietly=TRUE)
yprime <- y[learnind]
Xprime <- X[learnind,,drop=FALSE]
yprime <- as.factor(yprime)
des <- model.matrix(~yprime)
limo <- lmFit(t(Xprime), des)
outp <- eBayes(limo, ...)
stat <- classifyTestsF(object = outp, fstat.only = TRUE)
attr(stat, "df1") <- NULL
attr(stat, "df2") <- NULL
new("varseloutput", varsel=stat)
}
More information about the Bioconductor
mailing list