[BioC] Detection of differential expression using limma

Mon Oct 13 19:18:20 CEST 2008

Hi Sean and everyone,

thank you for your reply. I was aware that for log intensities the fold 
change is calculated via substraction.
I have posted on example of intensity values for a ample gene in the two 
groups I am interessted in.
I have listed all the respective values for that particular gene after 
transformation with the VSN or quantile method
as well as simple log2 transformation of the raw data.

hsa-let-7a (gene)
10.4093909361377 12.4491486453754  (log2 transformed data)
10.0910975543547 10.5456558587892  (VSN transformed data)
10.1711963483519 10.4027789253209  (quantile transformed data)
1360                         5592                        (raw data)

Like mentioned previously I am using the limma package.
I read the tab-deliminted files containing the raw data as advised by 
Gordon Smyth and Peter White in a
previous post handling the same topic.

mrawObj <- read.maimages(targets$FileName,
columns = list(G = "gMedianSignal", Gb = "gBGUsed", R = 
"gProcessedSignal",Rb = "gBGMedianSignal"),
annotation= c("ProbeName", "GeneName", "SystematicName","ControlType"))
mnormObj <- mrawObj

Then I perform normalization only on the green channel

mnormObj$G <-normalizeBetweenArrays(mnormObj$G,method="quantile")
mnormObj$G <- log2(mnormObj$G)

After that I extract the transformed intensity data for the green 
channel into a new
object and name the rows after the genes. As I have data for 16 arrays, 
this newly
created object contains 16 columns, one for each array, and around 
13.000 rows,
one for each individual gene.

preproc_data<-mnormObj_vsn$G
rownames(preproc_data)<-mnormObj_vsn$genes$GeneName

For identification of differentially expressed genes I proceed as described
in the limma userguide by defining a model.matrix and a cont.matrix to 
set the
comparisons that I want the program to compute.
After this I fit a linear model to my data.

fit <- lmFit(preproc_data, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)

and display the results for the different comparisons listed in the 
cont.matrix

Below a sample output from the VSN transformed data

 > topTable(fit2, coef=1, adjust="BH")
           ID                               logFC               t       
         P.Value            adj.P.Val             B
ebv-miR-BART18-5p       -2.788856     -3.925838     0.001719277     
0.9999662     -4.501933
hsa-miR-671-5p                 2.378257      3.875517     0.001891590 
    0.9999662     -4.503216
hsa-miR-663                       2.308951      3.872746     0.001901578 
    0.9999662     -4.503287
hsa-miR-583                      -1.509068     -3.758990     
0.002361635     0.9999662     -4.506259
hsa-miR-671-5p                  2.097918      3.661009     0.002848245 
    0.9999662     -4.508897
hsa-miR-671-5p                  2.055704      3.576170     0.003351448 
    0.9999662     -4.511239
hsa-miR-663                        2.354609      3.513237     
0.003782239     0.9999662     -4.513012
hsa-miR-671-5p                   2.136401      3.480265     0.004029907 
    0.9999662     -4.513952
hsa-miR-145*                     -3.167342     -3.452580     
0.004250482     0.9999662     -4.514748
hsa-miR-373                        1.602810      3.381800     
0.004871484     0.9999662     -4.516809

so the strange thing is, when looking at the entry for the top hit 
indetified by limma anylsis you get something like this

ebv-miR-BART18-5p 
5.20945336562895     5.4093909361377   (log2 transformation)
-0.971205336133305 1.69436870845910  (VSN transformed data)
5.19967234483636     5.31174831500496 (quantile transformation)
37                                42.5                         (raw data)

Now you may say, sure this is clearly a relict of VSN normalization. No 
doubt about that
but for the other methods the problem is still similar just the names of 
the genes are different
and the fold change and p-Values.

So as you see I don't understand why, for example the example above 
let-7a doesn't show up
in the top differentially expressed gene list, while others, which a 
clearly not differentially expressed
do.

Thanks for any kind of advice helping me out on this!