[BioC] significance of "wrong" clustering of differential genes
Sean Davis
sdavis2 at mail.nih.gov
Tue Nov 14 00:29:33 CET 2006
In addition to Naomi's comments, remember that a desired property of a
statistic is that it be "robust" to outliers (ignoring them when
appropriate). I think it is probably fine to have some proportion of the
samples "misclassified" by your clustering. However, when this happens, it
is a good idea to make sure that a sample mislabeling or some such thing has
not occurred. I have discovered an adult sample in what were supposed to be
pediatric samples, a mouse cell line among what were supposed to be all
canine, and other oddities like that by looking back at data. Most of the
time, though, these samples simply represent biological or technical
variation that we cannot fully explain.
Sean
On Monday 13 November 2006 16:02, Naomi Altman wrote:
> The heatmap did not come through (to me). However, clustering is
> highly dependent on the choice of distance measure.
>
> --Naomi
>
> At 09:57 AM 11/13/2006, Benjamin Otto wrote:
> >Hi,
> >
> >
> >
> >Please imagine the following situation:
> >
> >For two sample sets (set1, set2) the most differentially expressed genes
> > are identified by limma. The p.value correction would be "holm".
> > Afterwards a
> >
> >heatmap is printed for these genes. The procedure would look like:
> > > f <- factor(as.character(pheno[,marker]))
> > >
> > > design <- model.matrix(~f)
> > >
> > > fit <- eBayes(lmFit(eSet,design))
> > >
> > > tab <- topTable(fit, coef=2, number=nrow(eSet), adjust.method="holm")
> > >
> > > selected <- tab$adj.P.Val < 0.01 & abs(tab$M) >= 1
> > >
> > > ## print a heatmap for eSet[selected,]
> >
> >What can lead to a misclassification in the clustering, say one sample of
> >set1 is clustered together with set2? Afterall according to the workflow I
> >have explicitly been searching for the genes which should discriminate
> >between the two sets! However the expression values displayed in the
> > heatmap assume, that this samle IS more similar to the "wrong" set than
> > to the true one. (have a look at the jpg)
> >
> >Is it possible, that this sample is always treated as outlier in the
> >significance calculations?
> >
> >And if it is so, then: Is it sensible to take such a misclassification as
> >kind of significane?
> >
> >Regards
> >
> >
> >
> >Benjamin
> >
> >
> >
> >
> >
> >--
> >Benjamin Otto
> >Universitaetsklinikum Eppendorf Hamburg
> >Institut fuer Klinische Chemie
> >Martinistrasse 52
> >20246 Hamburg
> >
> >
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list