[BioC] Detection of differential expression using limma

Mon Oct 13 19:49:57 CEST 2008

On Mon, Oct 13, 2008 at 1:18 PM, Christian Eisen
<christianeisen at alice-dsl.de> wrote:
> Hi Sean and everyone,
>
> thank you for your reply. I was aware that for log intensities the fold
> change is calculated via substraction.
> I have posted on example of intensity values for a ample gene in the two
> groups I am interessted in.
> I have listed all the respective values for that particular gene after
> transformation with the VSN or quantile method
> as well as simple log2 transformation of the raw data.
>
> hsa-let-7a (gene)
> 10.4093909361377 12.4491486453754  (log2 transformed data)
> 10.0910975543547 10.5456558587892  (VSN transformed data)
> 10.1711963483519 10.4027789253209  (quantile transformed data)
> 1360                         5592                        (raw data)
>
> Like mentioned previously I am using the limma package.
> I read the tab-deliminted files containing the raw data as advised by Gordon
> Smyth and Peter White in a
> previous post handling the same topic.
>
> mrawObj <- read.maimages(targets$FileName,
> columns = list(G = "gMedianSignal", Gb = "gBGUsed", R =
> "gProcessedSignal",Rb = "gBGMedianSignal"),
> annotation= c("ProbeName", "GeneName", "SystematicName","ControlType"))
> mnormObj <- mrawObj

These arrays appear to be Agilent miRNA arrays.  I think the Agilent
recommendation is to use the Total Gene Signal for these arrays and
not the processed signal--you'll want to read the image extraction
software manual for the details.  Also, normalization of miRNA arrays
is not a solved problem because such a large proportion of the data
show differential expression.  I think there is a general feeling that
methods like vsn might be "better" than more traditional methods, but
I wouldn't say that there is a consensus on that and I do not have
literature to support that claim (perhaps others do).  There are
several discussions in the email archive on normalization of miRNA.  I
would suggest looking back at them for other ideas.

Sean

> Then I perform normalization only on the green channel
>
> mnormObj$G <-normalizeBetweenArrays(mnormObj$G,method="quantile")
> mnormObj$G <- log2(mnormObj$G)
>
> After that I extract the transformed intensity data for the green channel
> into a new
> object and name the rows after the genes. As I have data for 16 arrays, this
> newly
> created object contains 16 columns, one for each array, and around 13.000
> rows,
> one for each individual gene.
>
> preproc_data<-mnormObj_vsn$G
> rownames(preproc_data)<-mnormObj_vsn$genes$GeneName
>
> For identification of differentially expressed genes I proceed as described
> in the limma userguide by defining a model.matrix and a cont.matrix to set
> the
> comparisons that I want the program to compute.
> After this I fit a linear model to my data.
>
> fit <- lmFit(preproc_data, design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
> and display the results for the different comparisons listed in the
> cont.matrix
>
> Below a sample output from the VSN transformed data
>
>> topTable(fit2, coef=1, adjust="BH")
>          ID                               logFC               t
>  P.Value            adj.P.Val             B
> ebv-miR-BART18-5p       -2.788856     -3.925838     0.001719277
> 0.9999662     -4.501933
> hsa-miR-671-5p                 2.378257      3.875517     0.001891590
> 0.9999662     -4.503216
> hsa-miR-663                       2.308951      3.872746     0.001901578
> 0.9999662     -4.503287
> hsa-miR-583                      -1.509068     -3.758990     0.002361635
> 0.9999662     -4.506259
> hsa-miR-671-5p                  2.097918      3.661009     0.002848245
> 0.9999662     -4.508897
> hsa-miR-671-5p                  2.055704      3.576170     0.003351448
> 0.9999662     -4.511239
> hsa-miR-663                        2.354609      3.513237     0.003782239
>   0.9999662     -4.513012
> hsa-miR-671-5p                   2.136401      3.480265     0.004029907
> 0.9999662     -4.513952
> hsa-miR-145*                     -3.167342     -3.452580     0.004250482
> 0.9999662     -4.514748
> hsa-miR-373                        1.602810      3.381800     0.004871484
>   0.9999662     -4.516809
>
> so the strange thing is, when looking at the entry for the top hit
> indetified by limma anylsis you get something like this
>
> ebv-miR-BART18-5p 5.20945336562895     5.4093909361377   (log2
> transformation)
> -0.971205336133305 1.69436870845910  (VSN transformed data)
> 5.19967234483636     5.31174831500496 (quantile transformation)
> 37                                42.5                         (raw data)
>
> Now you may say, sure this is clearly a relict of VSN normalization. No
> doubt about that
> but for the other methods the problem is still similar just the names of the
> genes are different
> and the fold change and p-Values.
>
> So as you see I don't understand why, for example the example above let-7a
> doesn't show up
> in the top differentially expressed gene list, while others, which a clearly
> not differentially expressed
> do.
>
> Thanks for any kind of advice helping me out on this!
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>