[BioC] LIMMA P-value calculations/Suggestions for flagged data
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Mar 22 12:49:31 CET 2007
> Date: Wed, 21 Mar 2007 16:04:31 -0400
> From: "Lance E. Palmer" <lance.palmer at stonybrook.edu>
> Subject: [BioC] LIMMA P-value calculations/Suggestions for flagged
> data
> To: bioconductor at stat.math.ethz.ch
>
> I just had a question/concern about P value calculations in Limma (I am
> using latest version of Bioconductor)
>
> I recently ran 3 arrays through my analysis. The slides were analayzed
> with Genepix software. There were a couple of genes that concerned me.
> One had a log fold change of -3.765. The adjusted p-value (fdr)
> was .027. I looked at the individual M values for the arrays and they
> were -0.009336, 0.09217 and -3.765.
>
> I noticed that the first two arrays had a 'not found' flag. So
> basically the analysis gave a significant P-value using only 1 piece of
> data. Is this something that is correct?
Yes, it is correct. If there is only one data value with weight>0 for a particular probe, then
limma uses the empirical Bayes prior standard deviation for that probe to form a t-statistic.
Think of it this way. You observed M=-3.765 for this probe. That's a large negative value. You
know from looking at the other probes that the standard deviation of M-values is usually around
0.03, say, so -3.7 is very likely genuinely different from zero.
> I also wonder if I should even remove 'not found' flagged data.
> Originally I did not, but someone suggested I do. I originally did not
> remove it because of the case listed above.
I've argued on this mailing list and elsewhere for a long time that, rather than flagging faint
spots, it's better to use a better background correction method that avoids a blow out of M-values
at low intensities.
Best wishes
Gordon
> However, the case above tells us something about the experiments. How
> do people deal with this situation?
>
> -Lance Palmer
More information about the Bioconductor
mailing list