[BioC] Very low P-values in limma

Paul Geeleher paulgeeleher at gmail.com
Fri Oct 23 11:50:52 CEST 2009


Yes some of the fold changes are quite large and seeing as my code
appears to be correct it looks like these results are right. I guess
it comes as a surprise to some people here as the same dataset was
analysed using different software which showed no significance. Well
done Limma!

Paul.

On Fri, Oct 23, 2009 at 3:45 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear Paul,
>
> I'm not quite sure why you're so shocked about getting significant results.
>  Usually people complain to me when they don't get significance :).
>
> limma is able to obtain much higher significance levels than a t-test
> because it (i) leverages information from the within-array replicates and
> (ii) borrows information across probes.  These two approaches in combination
> increase the effective degrees of freedom dramatically.
>
> We would hardly go to so much time and trouble to develop new statistical
> methods if they didn't improve on ordinary t-tests.
>
> If I were you, I'd be looking at the sizes of the fold changes, and whether
> the results seem biologically sensible and agree with known information, and
> so on.
>
> Best wishes
> Gordon
>
>
> ---------- original message -----------
> [BioC] Very low P-values in limma
> Paul Geeleher paulgeeleher at gmail.com
> Thu Oct 22 11:34:01 CEST 2009
>
> The value of df.prior is 3.208318. Hopefully somebody can help me here
> because I'm still really at a loss on why these values are so low.
>
> Here's the script. I'm fairly sure it conforms to the documentation:
>
> library(affy)
>
> setwd('/home/paul/PhD_Stuff/miRNA_guilleme_project/HERArrays/')
>
> # create a color pallete for boxplots
> cols <- brewer.pal(12, "Set3")
>
> # This defines the column name of the mean Cy3 foreground intensites
> #Cy3 <- "F532 Median"
> # background corrected value
> Cy3 <- "F532 Median - B532"
>
> # This defines the column name of the mean Cy3 background intensites
> Cy3b <- "B532 Mean"
>
> # Read the targets file (see limmaUsersGuide for info on how to create this)
> targets <- readTargets("targets.csv")
>
> # read values from gpr file
> # because there is no red channel, read Rb & R to be the same values as G
> and Gb
> # this is a type of hack that allows the function to work
> RG <- read.maimages( targets$FileName,
> source="genepix",columns=list(R=Cy3,G=Cy3, Rb=Cy3b, Gb=Cy3b))
>
> # remove the extraneous values red channel values
> RG$R <- NULL
> RG$Rb <- NULL
>
> # this line of code will remove any negative values which cause errors
> further on
> RG$G[RG$G<0] <- 0
>
> # create a pData for the data just read
> # this indicates which population each array belongs to
> # a are "her-" and b are "her+" (almost certain)
> pData <- data.frame(population = c('a', 'a', 'a', 'a', 'b', 'b', 'b'))
> rownames(pData) <-  RG$targets$FileName
>
> # create design matrix
> design <- model.matrix(~factor(pData$population))
>
> # In my .gpr files all miRNAs contain the string "-miR-" or "-let-" in their
> name
> # so the grep function can be used to extract all of these, removing
> # all control signals and printing buffers etc.
> # You need to check your .gpr files to find which signals you should
> extract.
> miRs <- c(grep("-miR-", RG$genes$Name), grep("-let-", RG$genes$Name))
> RG.final <- RG[miRs, ]
>
> # load vsn library, this contains the normalization functions
> library('vsn')
>
> # Do VSN normalization and output as vns object
> mat <- vsnMatrix(RG.final$G)
>
> # insert rownames into matrix with normalized data
> # this will mean that the gene names will appear in our final output
> rownames(mat at hx) <- RG.final$genes$Name
>
> # my .gpr files contain 4 "within-array replicates" of each miRNA.
> # We need to make full use of this information by calculating the duplicate
> correlation
> # in order to use the duplicateCorrelation() function below,
> # we need to make sure that the replicates of each gene appear in sequence
> in this matrix,
> # so we sort the normalized values so the replicate groups of 4 miRs appear
> in sequence
> mat at hx <- mat at hx[order(rownames(mat at hx)), ]
>
> # calculate duplicate correlation between the 4 replicates on each array
> corfit <- duplicateCorrelation(mat, design, ndups=4)
>
> # fit the linear model, including info on duplicates and correlation
> fit <- lmFit(mat, design, ndups=4, correlation=corfit$consensus)
>
> # calculate values using ebayes
> ebayes <- eBayes(fit)
>
> # output a list of top differnetially expressed genes...
> topTable(ebayes, coef = 2, adjust = "BH", n = 10)
>
>
> -Paul.
>
>
> On Thu, Oct 22, 2009 at 12:09 AM, Wei Shi <shi at wehi.edu.au> wrote:
>
>> Dear Paul:
>>
>>  The low p-values you have got are not surprising to me. I have got
>
> even
>>
>> lower FDR adjusted p-values than yours. This just means you got
>> significantly differentially expressed genes. But on the other hand, I
>
> did
>>
>> see much higher adjusted p-values in some of my microarray analyses. In
>
> that
>>
>> case, I will explore the data in more depth such as looking at batch
>
> effect
>>
>> etc.
>>
>> Cheers,
>> Wei
>>
>>
>> Paul Geeleher wrote:
>>
>>> Hi folks, I'm analyzing microRNA data using limma and I'm wondering
>
> about
>>>
>>> the validity of the p-values I'm getting out. Its a simple 'Group A Vs
>>> Group
>>> B' experimental design. 4 arrays in one group, 3 in the other and 4
>>> duplicate spots for each miRNA on each array.
>>>
>>> The lowest adjusted p-values in the differential expression analysis
>
> are
>>>
>>> in
>>> the region of 10^-7.
>>>
>>> Its been pointed out to me that plugging the point values from each
>
> sample
>>>
>>> into a regular t-test you get p=0.008, which then also needs to take
>
> the
>>>
>>> multiple test hit. Can anybody explain why limma is giving me such
>
> lower
>>>
>>> values and if they are valid?
>>>
>>> I can provide more information if required.
>>>
>>> Thanks,
>>>
>>> Paul.
>>>
>>>
>>>
>>
>
>
> --
> Paul Geeleher
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
> Galway
> Ireland
>
>



-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland



More information about the Bioconductor mailing list