[BioC] Detection of differential expression using limma
Sean Davis
sdavis2 at mail.nih.gov
Mon Oct 13 19:49:57 CEST 2008
On Mon, Oct 13, 2008 at 1:18 PM, Christian Eisen
<christianeisen at alice-dsl.de> wrote:
> Hi Sean and everyone,
>
> thank you for your reply. I was aware that for log intensities the fold
> change is calculated via substraction.
> I have posted on example of intensity values for a ample gene in the two
> groups I am interessted in.
> I have listed all the respective values for that particular gene after
> transformation with the VSN or quantile method
> as well as simple log2 transformation of the raw data.
>
> hsa-let-7a (gene)
> 10.4093909361377 12.4491486453754 (log2 transformed data)
> 10.0910975543547 10.5456558587892 (VSN transformed data)
> 10.1711963483519 10.4027789253209 (quantile transformed data)
> 1360 5592 (raw data)
>
> Like mentioned previously I am using the limma package.
> I read the tab-deliminted files containing the raw data as advised by Gordon
> Smyth and Peter White in a
> previous post handling the same topic.
>
> mrawObj <- read.maimages(targets$FileName,
> columns = list(G = "gMedianSignal", Gb = "gBGUsed", R =
> "gProcessedSignal",Rb = "gBGMedianSignal"),
> annotation= c("ProbeName", "GeneName", "SystematicName","ControlType"))
> mnormObj <- mrawObj
These arrays appear to be Agilent miRNA arrays. I think the Agilent
recommendation is to use the Total Gene Signal for these arrays and
not the processed signal--you'll want to read the image extraction
software manual for the details. Also, normalization of miRNA arrays
is not a solved problem because such a large proportion of the data
show differential expression. I think there is a general feeling that
methods like vsn might be "better" than more traditional methods, but
I wouldn't say that there is a consensus on that and I do not have
literature to support that claim (perhaps others do). There are
several discussions in the email archive on normalization of miRNA. I
would suggest looking back at them for other ideas.
Sean
> Then I perform normalization only on the green channel
>
> mnormObj$G <-normalizeBetweenArrays(mnormObj$G,method="quantile")
> mnormObj$G <- log2(mnormObj$G)
>
> After that I extract the transformed intensity data for the green channel
> into a new
> object and name the rows after the genes. As I have data for 16 arrays, this
> newly
> created object contains 16 columns, one for each array, and around 13.000
> rows,
> one for each individual gene.
>
> preproc_data<-mnormObj_vsn$G
> rownames(preproc_data)<-mnormObj_vsn$genes$GeneName
>
> For identification of differentially expressed genes I proceed as described
> in the limma userguide by defining a model.matrix and a cont.matrix to set
> the
> comparisons that I want the program to compute.
> After this I fit a linear model to my data.
>
> fit <- lmFit(preproc_data, design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
> and display the results for the different comparisons listed in the
> cont.matrix
>
> Below a sample output from the VSN transformed data
>
>> topTable(fit2, coef=1, adjust="BH")
> ID logFC t
> P.Value adj.P.Val B
> ebv-miR-BART18-5p -2.788856 -3.925838 0.001719277
> 0.9999662 -4.501933
> hsa-miR-671-5p 2.378257 3.875517 0.001891590
> 0.9999662 -4.503216
> hsa-miR-663 2.308951 3.872746 0.001901578
> 0.9999662 -4.503287
> hsa-miR-583 -1.509068 -3.758990 0.002361635
> 0.9999662 -4.506259
> hsa-miR-671-5p 2.097918 3.661009 0.002848245
> 0.9999662 -4.508897
> hsa-miR-671-5p 2.055704 3.576170 0.003351448
> 0.9999662 -4.511239
> hsa-miR-663 2.354609 3.513237 0.003782239
> 0.9999662 -4.513012
> hsa-miR-671-5p 2.136401 3.480265 0.004029907
> 0.9999662 -4.513952
> hsa-miR-145* -3.167342 -3.452580 0.004250482
> 0.9999662 -4.514748
> hsa-miR-373 1.602810 3.381800 0.004871484
> 0.9999662 -4.516809
>
> so the strange thing is, when looking at the entry for the top hit
> indetified by limma anylsis you get something like this
>
> ebv-miR-BART18-5p 5.20945336562895 5.4093909361377 (log2
> transformation)
> -0.971205336133305 1.69436870845910 (VSN transformed data)
> 5.19967234483636 5.31174831500496 (quantile transformation)
> 37 42.5 (raw data)
>
> Now you may say, sure this is clearly a relict of VSN normalization. No
> doubt about that
> but for the other methods the problem is still similar just the names of the
> genes are different
> and the fold change and p-Values.
>
> So as you see I don't understand why, for example the example above let-7a
> doesn't show up
> in the top differentially expressed gene list, while others, which a clearly
> not differentially expressed
> do.
>
> Thanks for any kind of advice helping me out on this!
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list