[BioC] significance of "wrong" clustering of differential genes

Benjamin Otto b.otto at uke.uni-hamburg.de
Mon Nov 13 15:57:05 CET 2006



Please imagine the following situation: 

For two sample sets (set1, set2) the most differentially expressed genes are
identified by limma. The p.value correction would be "holm". Afterwards a
heatmap is printed for these genes. The procedure would look like:


>  f <- factor(as.character(pheno[,marker]))

> design <- model.matrix(~f)

> fit <- eBayes(lmFit(eSet,design))

> tab <- topTable(fit, coef=2, number=nrow(eSet), adjust.method="holm")

> selected <- tab$adj.P.Val < 0.01 & abs(tab$M) >= 1

> ## print a heatmap for eSet[selected,]



What can  lead to a misclassification in the clustering, say one sample of
set1 is clustered together with set2? Afterall according to the workflow I
have explicitly been searching for the genes which should discriminate
between the two sets! However the expression values displayed in the heatmap
assume, that this samle IS more similar to the "wrong" set than to the true
one. (have a look at the jpg)

Is it possible, that this sample is always treated as outlier in the
significance calculations? 

And if it is so, then: Is it sensible to take such a misclassification as
kind of significane?






Benjamin Otto
Universitaetsklinikum Eppendorf Hamburg
Institut fuer Klinische Chemie
Martinistrasse 52
20246 Hamburg


More information about the Bioconductor mailing list