[BioC] Almost inexisting overlap of diff. expr. genes found when comparing mas5 / rma
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Sat Jul 9 10:09:02 CEST 2005
Yes we often see poor overlaps. A 40 - 50 % overlap is considered
pretty good but rare unless you are considering the top 5 genes
in both list or something silly like that.
To make a fair comparison, try comparing the lists when they are
both filtered by the same p-value cutoff or statistics rather than
arbitrarily choosing a numbers.
Further, two minor cosmetic points about your code
1) If you look at your design matrix from
strain = c("WT","WT","WT","Drug","Drug","Drug")
design = model.matrix(~factor(strain))
colnames(design) = c("WT","Drug")
design
WT Drug
1 1 1
2 1 1
3 1 1
4 1 0
5 1 0
6 1 0
the first column represents an intercept not WT. To get the
correct interpretation, you need to change the second line to
design = model.matrix(~ -1 + factor(strain) )
2) You do not need the force the rownames to numeric using
as.numeric() since intersect happily works with characters.
x <- c("a", "b", "c")
y <- c("b", "c", "d")
intersect(x,y)
[1] "b" "c"
But I do not think either of these point change your results.
On Fri, 2005-07-08 at 18:18 +0100, Emmanuel Levy wrote:
> Dear Bioconductor community,
>
> I've been looking for differentially expressed genes in C. elegans after a
> drug treatment.
> There are 3 replicates of each condition and 2 conditions in total (WT and
> Drug)
> I used limma combined with either rma or mas5. I find a very very poor
> overlap in the results:
>
> - example (i) only 24 of the 100 most differentially expressed genes
> obtained using rma are found in
> the 1000 most differentially expressed genes obtained using mas5
> - example (ii) only 183 genes are common to the lists of the 1000 most
> differentially expressed genes
> found using both methods.
> (see piece of code at the end)
>
> Either
> 1/ I am missing something which I would'nt be surprised of, as my expertise
> is very limited.
>
> In that case I am sorry for pointing out something irrelevant and thank you
> in advance for telling
> me what I'm missing,
>
> 2/ The differences in the normalization methods are really at the origin of
> the observed differences.
> In that case, how can I know which method is the best for my case study?
> Does a helpful paper exists
> which explains in simple words the strengths/weaknesses of each method?
>
> Thank you very much in advance for your help,
>
> Emmanuel
>
> -------------------------------------- CODE
> --------------------------------------
> library(affy)
> library(limma)
>
> # Load data into Affybatch
> data = ReadAffy(widget=T)
>
> # Background correction / normalization
> eset.rma = rma(data)
> eset.mas = mas5(data)
>
> # Get Expression values
> exp.rma = exprs(eset.rma)
> exp.mas = exprs(eset.mas)
>
> # --- Look for differentially expressed genes using Limma package
> strain = c("WT","WT","WT","Drug","Drug","Drug")
> design = model.matrix(~factor(strain))
> colnames(design) = c("WT","Drug")
>
> fit.rma = lmFit(eset.rma,design)
> fit.mas = lmFit(eset.mas,design)
>
> fit.rma.2 = eBayes(fit.rma)
> fit.mas.2 = eBayes(fit.mas)
>
> top.rma = as.numeric(rownames(topTable(fit.rma.2,n=1000)))
> top.mas = as.numeric(rownames(topTable(fit.mas.2,n=100)))
> length(intersect(top.rma,top.mas))
> > [1] 24
>
> top.rma = as.numeric(rownames(topTable(fit.rma.2,n=100)))
> top.mas = as.numeric(rownames(topTable(fit.mas.2,n=1000)))
> length(intersect(top.rma,top.mas))
> > [1] 0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list