[BioC] Detection of differential expression using limma
Christian Eisen
christianeisen at alice-dsl.de
Mon Oct 13 19:18:20 CEST 2008
Hi Sean and everyone,
thank you for your reply. I was aware that for log intensities the fold
change is calculated via substraction.
I have posted on example of intensity values for a ample gene in the two
groups I am interessted in.
I have listed all the respective values for that particular gene after
transformation with the VSN or quantile method
as well as simple log2 transformation of the raw data.
hsa-let-7a (gene)
10.4093909361377 12.4491486453754 (log2 transformed data)
10.0910975543547 10.5456558587892 (VSN transformed data)
10.1711963483519 10.4027789253209 (quantile transformed data)
1360 5592 (raw data)
Like mentioned previously I am using the limma package.
I read the tab-deliminted files containing the raw data as advised by
Gordon Smyth and Peter White in a
previous post handling the same topic.
mrawObj <- read.maimages(targets$FileName,
columns = list(G = "gMedianSignal", Gb = "gBGUsed", R =
"gProcessedSignal",Rb = "gBGMedianSignal"),
annotation= c("ProbeName", "GeneName", "SystematicName","ControlType"))
mnormObj <- mrawObj
Then I perform normalization only on the green channel
mnormObj$G <-normalizeBetweenArrays(mnormObj$G,method="quantile")
mnormObj$G <- log2(mnormObj$G)
After that I extract the transformed intensity data for the green
channel into a new
object and name the rows after the genes. As I have data for 16 arrays,
this newly
created object contains 16 columns, one for each array, and around
13.000 rows,
one for each individual gene.
preproc_data<-mnormObj_vsn$G
rownames(preproc_data)<-mnormObj_vsn$genes$GeneName
For identification of differentially expressed genes I proceed as described
in the limma userguide by defining a model.matrix and a cont.matrix to
set the
comparisons that I want the program to compute.
After this I fit a linear model to my data.
fit <- lmFit(preproc_data, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
and display the results for the different comparisons listed in the
cont.matrix
Below a sample output from the VSN transformed data
> topTable(fit2, coef=1, adjust="BH")
ID logFC t
P.Value adj.P.Val B
ebv-miR-BART18-5p -2.788856 -3.925838 0.001719277
0.9999662 -4.501933
hsa-miR-671-5p 2.378257 3.875517 0.001891590
0.9999662 -4.503216
hsa-miR-663 2.308951 3.872746 0.001901578
0.9999662 -4.503287
hsa-miR-583 -1.509068 -3.758990
0.002361635 0.9999662 -4.506259
hsa-miR-671-5p 2.097918 3.661009 0.002848245
0.9999662 -4.508897
hsa-miR-671-5p 2.055704 3.576170 0.003351448
0.9999662 -4.511239
hsa-miR-663 2.354609 3.513237
0.003782239 0.9999662 -4.513012
hsa-miR-671-5p 2.136401 3.480265 0.004029907
0.9999662 -4.513952
hsa-miR-145* -3.167342 -3.452580
0.004250482 0.9999662 -4.514748
hsa-miR-373 1.602810 3.381800
0.004871484 0.9999662 -4.516809
so the strange thing is, when looking at the entry for the top hit
indetified by limma anylsis you get something like this
ebv-miR-BART18-5p
5.20945336562895 5.4093909361377 (log2 transformation)
-0.971205336133305 1.69436870845910 (VSN transformed data)
5.19967234483636 5.31174831500496 (quantile transformation)
37 42.5 (raw data)
Now you may say, sure this is clearly a relict of VSN normalization. No
doubt about that
but for the other methods the problem is still similar just the names of
the genes are different
and the fold change and p-Values.
So as you see I don't understand why, for example the example above
let-7a doesn't show up
in the top differentially expressed gene list, while others, which a
clearly not differentially expressed
do.
Thanks for any kind of advice helping me out on this!
More information about the Bioconductor
mailing list