[BioC] Almost inexisting overlap of diff. expr. genes foundwhen comparing mas5 / rma
kfbargad@lg.ehu.es
kfbargad at lg.ehu.es
Mon Jul 11 12:22:08 CEST 2005
Dear Emmanuel and Jim,
have you found whether the common genes are those that have the lowest
between-replicate variation in the arrays among your list, or maybe
those that have a signal intensity range of say between 1000 and 10000
units?
In my short experience with arrays and MAS5, I have been able to
validate many array results in an biological model-dependent way,
using qPCR, which tells me that MAS5 can provide some good data. I
haven´t performed many analyses with RMA and haven´t validated them
either, but, and
correct me if I am wrong, if MAS5 tends to fail at low intensities and
at
high intensities there is great variation, and RMA performs better at
low intensities and probably similarly to MAS5 at medium to hight
intensities, might the common genes be in that medium to high
intensity range?
Regards,
David
> Emmanuel Levy wrote:
> > Dear Bioconductor community,
> >
> > I've been looking for differentially expressed genes in C. elegans
after a
> > drug treatment.
> > There are 3 replicates of each condition and 2 conditions in total
(WT and
> > Drug)
> > I used limma combined with either rma or mas5. I find a very very
poor
> > overlap in the results:
> >
> > - example (i) only 24 of the 100 most differentially expressed
genes
> > obtained using rma are found in
> > the 1000 most differentially expressed genes obtained using mas5
> > - example (ii) only 183 genes are common to the lists of the 1000
most
> > differentially expressed genes
> > found using both methods.
> > (see piece of code at the end)
>
> Unfortunately, this is a very common result. We recently did a study
of
> 7 different methods for Affy data, and found very poor overlap in
the
> set of significant genes.
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15705192&query_hl=1
>
> One problem with microarray data is the lack of 'true' measurements
that
> can be used to objectively assess the results of any given method.
> Instead we are forced to judge the results based on ideas that may
not
> be easily defended.
>
> For instance, in the above paper, we compared two different sample
types
> using either t-tests or a Wilcoxon rank sum, and chose the method
that
> gave the most 'differentially expressed' genes at the lowest false
> discovery rate. I don't think you would have to argue very
strenuously
> that this doesn't really prove one method is better than another.
>
> We did this analysis because my colleagues argue against using the
> Affymetrix spike-in data to assess a method because you can always
> 'tune' a method to work best with the spike-in data, without having
any
> proof that it works well at all with 'real' data.
>
> The only way I know to objectively test the different methods would
be
> to take some samples, randomly select many (where many == thousands)
> genes to test using an agreed upon 'gold standard' (qRT-PCR, most
> likely), then analyze the samples using Affy chips and see which
method
> correlates best with the gold standard result. Probably only take
> US$10,000 or so to do.
>
> In the interim the only recourse as I see it is to pick a favorite
> method (based on something suitably intangible) and stick with it ;-
D.
>
> Best,
>
> Jim
>
>
>
>
> >
> > Either
> > 1/ I am missing something which I would'nt be surprised of, as my
expertise
> > is very limited.
> >
> > In that case I am sorry for pointing out something irrelevant and
thank you
> > in advance for telling
> > me what I'm missing,
> >
> > 2/ The differences in the normalization methods are really at the
origin of
> > the observed differences.
> > In that case, how can I know which method is the best for my case
study?
> > Does a helpful paper exists
> > which explains in simple words the strengths/weaknesses of each
method?
> >
> > Thank you very much in advance for your help,
> >
> > Emmanuel
> >
> > -------------------------------------- CODE
> > --------------------------------------
> > library(affy)
> > library(limma)
> >
> > # Load data into Affybatch
> > data = ReadAffy(widget=T)
> >
> > # Background correction / normalization
> > eset.rma = rma(data)
> > eset.mas = mas5(data)
> >
> > # Get Expression values
> > exp.rma = exprs(eset.rma)
> > exp.mas = exprs(eset.mas)
> >
> > # --- Look for differentially expressed genes using Limma package
> > strain = c("WT","WT","WT","Drug","Drug","Drug")
> > design = model.matrix(~factor(strain))
> > colnames(design) = c("WT","Drug")
> >
> > fit.rma = lmFit(eset.rma,design)
> > fit.mas = lmFit(eset.mas,design)
> >
> > fit.rma.2 = eBayes(fit.rma)
> > fit.mas.2 = eBayes(fit.mas)
> >
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> > length(intersect(top.rma,top.mas))
> >
> >>[1] 24
> >
> >
> > top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> > top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> > length(intersect(top.rma,top.mas))
> >
> >>[1] 0
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
> --
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list