[BioC] Almost inexisting overlap of diff. expr. genes found when comparing mas5 / rma
James W. MacDonald
jmacdon at med.umich.edu
Fri Jul 8 21:21:40 CEST 2005
Emmanuel Levy wrote:
> Dear Bioconductor community,
> I've been looking for differentially expressed genes in C. elegans after a
> drug treatment.
> There are 3 replicates of each condition and 2 conditions in total (WT and
> I used limma combined with either rma or mas5. I find a very very poor
> overlap in the results:
> - example (i) only 24 of the 100 most differentially expressed genes
> obtained using rma are found in
> the 1000 most differentially expressed genes obtained using mas5
> - example (ii) only 183 genes are common to the lists of the 1000 most
> differentially expressed genes
> found using both methods.
> (see piece of code at the end)
Unfortunately, this is a very common result. We recently did a study of
7 different methods for Affy data, and found very poor overlap in the
set of significant genes.
One problem with microarray data is the lack of 'true' measurements that
can be used to objectively assess the results of any given method.
Instead we are forced to judge the results based on ideas that may not
be easily defended.
For instance, in the above paper, we compared two different sample types
using either t-tests or a Wilcoxon rank sum, and chose the method that
gave the most 'differentially expressed' genes at the lowest false
discovery rate. I don't think you would have to argue very strenuously
that this doesn't really prove one method is better than another.
We did this analysis because my colleagues argue against using the
Affymetrix spike-in data to assess a method because you can always
'tune' a method to work best with the spike-in data, without having any
proof that it works well at all with 'real' data.
The only way I know to objectively test the different methods would be
to take some samples, randomly select many (where many == thousands)
genes to test using an agreed upon 'gold standard' (qRT-PCR, most
likely), then analyze the samples using Affy chips and see which method
correlates best with the gold standard result. Probably only take
US$10,000 or so to do.
In the interim the only recourse as I see it is to pick a favorite
method (based on something suitably intangible) and stick with it ;-D.
> 1/ I am missing something which I would'nt be surprised of, as my expertise
> is very limited.
> In that case I am sorry for pointing out something irrelevant and thank you
> in advance for telling
> me what I'm missing,
> 2/ The differences in the normalization methods are really at the origin of
> the observed differences.
> In that case, how can I know which method is the best for my case study?
> Does a helpful paper exists
> which explains in simple words the strengths/weaknesses of each method?
> Thank you very much in advance for your help,
> -------------------------------------- CODE
> # Load data into Affybatch
> data = ReadAffy(widget=T)
> # Background correction / normalization
> eset.rma = rma(data)
> eset.mas = mas5(data)
> # Get Expression values
> exp.rma = exprs(eset.rma)
> exp.mas = exprs(eset.mas)
> # --- Look for differentially expressed genes using Limma package
> strain = c("WT","WT","WT","Drug","Drug","Drug")
> design = model.matrix(~factor(strain))
> colnames(design) = c("WT","Drug")
> fit.rma = lmFit(eset.rma,design)
> fit.mas = lmFit(eset.mas,design)
> fit.rma.2 = eBayes(fit.rma)
> fit.mas.2 = eBayes(fit.mas)
> top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
Ann Arbor MI 48109
More information about the Bioconductor