[BioC] Almost inexisting overlap of diff. expr. genes foundwhen comparing mas5 / rma

Mon Jul 11 12:22:08 CEST 2005

Dear Emmanuel and Jim,

have you found whether the common genes are those that have the lowest 
between-replicate variation in the arrays among your list, or maybe 
those that have a signal intensity range of say between 1000 and 10000 
units?

In my short experience with arrays and MAS5, I have been able to 
validate many array results in an biological model-dependent way, 
using qPCR, which tells me that MAS5 can provide some good data. I 
haven´t performed many analyses with RMA and haven´t validated them 
either, but, and 
correct me if I am wrong, if MAS5 tends to fail at low intensities and 
at 
high intensities there is great variation, and RMA performs better at 
low intensities and probably similarly to MAS5 at medium to hight 
intensities, might the common genes be in that medium to high 
intensity range?

Regards,

David

> Emmanuel Levy wrote:
> > Dear Bioconductor community,
> > 
> > I've been looking for differentially expressed genes in C. elegans 
after a 
> > drug treatment.
> > There are 3 replicates of each condition and 2 conditions in total 
(WT and 
> > Drug)
> > I used limma combined with either rma or mas5. I find a very very 
poor 
> > overlap in the results:
> > 
> > - example (i) only 24 of the 100 most differentially expressed 
genes 
> > obtained using rma are found in
> > the 1000 most differentially expressed genes obtained using mas5
> > - example (ii) only 183 genes are common to the lists of the 1000 
most 
> > differentially expressed genes
> > found using both methods.
> > (see piece of code at the end)
> 
> Unfortunately, this is a very common result. We recently did a study 
of 
> 7 different methods for Affy data, and found very poor overlap in 
the 
> set of significant genes.
> 
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15705192&query_hl=1
> 
> One problem with microarray data is the lack of 'true' measurements 
that 
> can be used to objectively assess the results of any given method. 
> Instead we are forced to judge the results based on ideas that may 
not 
> be easily defended.
> 
> For instance, in the above paper, we compared two different sample 
types 
> using either t-tests or a Wilcoxon rank sum, and chose the method 
that 
> gave the most 'differentially expressed' genes at the lowest false 
> discovery rate. I don't think you would have to argue very 
strenuously 
> that this doesn't really prove one method is better than another.
> 
> We did this analysis because my colleagues argue against using the 
> Affymetrix spike-in data to assess a method because you can always 
> 'tune' a method to work best with the spike-in data, without having 
any 
> proof that it works well at all with 'real' data.
> 
> The only way I know to objectively test the different methods would 
be 
> to take some samples, randomly select many (where many == thousands) 
> genes to test using an agreed upon 'gold standard' (qRT-PCR, most 
> likely), then analyze the samples using Affy chips and see which 
method 
> correlates best with the gold standard result. Probably only take 
> US$10,000 or so to do.
> 
> In the interim the only recourse as I see it is to pick a favorite 
> method (based on something suitably intangible) and stick with it ;-
D.
> 
> Best,
> 
> Jim
> 
> 
> 
> 
> > 
> > Either 
> > 1/ I am missing something which I would'nt be surprised of, as my 
expertise 
> > is very limited.
> > 
> > In that case I am sorry for pointing out something irrelevant and 
thank you 
> > in advance for telling
> > me what I'm missing,
> > 
> > 2/ The differences in the normalization methods are really at the 
origin of 
> > the observed differences.
> > In that case, how can I know which method is the best for my case 
study? 
> > Does a helpful paper exists 
> > which explains in simple words the strengths/weaknesses of each 
method?
> > 
> > Thank you very much in advance for your help,
> > 
> > Emmanuel
> > 
> > -------------------------------------- CODE 
> > --------------------------------------
> > library(affy)
> > library(limma)
> > 
> > # Load data into Affybatch
> > data = ReadAffy(widget=T)
> > 
> > # Background correction / normalization
> > eset.rma = rma(data)
> > eset.mas = mas5(data)
> > 
> > # Get Expression values
> > exp.rma = exprs(eset.rma)
> > exp.mas = exprs(eset.mas)
> > 
> > # --- Look for differentially expressed genes using Limma package
> > strain = c("WT","WT","WT","Drug","Drug","Drug")
> > design = model.matrix(~factor(strain))
> > colnames(design) = c("WT","Drug")
> > 
> > fit.rma = lmFit(eset.rma,design)
> > fit.mas = lmFit(eset.mas,design)
> > 
> > fit.rma.2 = eBayes(fit.rma)
> > fit.mas.2 = eBayes(fit.mas)
> > 
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> > length(intersect(top.rma,top.mas))
> > 
> >>[1] 24
> > 
> > 
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> > length(intersect(top.rma,top.mas))
> > 
> >>[1] 0
> > 
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> 
> -- 
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>