[BioC] Limma weights, P values and differential expression
Gordon Smyth
smyth at wehi.EDU.AU
Sat Aug 26 02:52:06 CEST 2006
Dear Alfredo,
At 08:00 PM 25/08/2006, bioconductor-request at stat.math.ethz.ch wrote:
>Date: Thu, 24 Aug 2006 07:29:07 -0700 (PDT)
>From: Alfredo Juncal <alfjuncal at yahoo.com>
>Subject: [BioC] Limma weights, P values and differential expression
>To: bioconductor at stat.math.ethz.ch
>
>Dear Limma users,
>
> Although I have read the manual and searched the mailing list, I
> still do not fully understandf how weights are used when applying
> the linear model.
In limma, weights are always treated as relative weights in model
fits. This approach is customary in regression analysis and is
explained in most statistics textbooks. A consequence of this
approach is that the absolute size of the weights is immaterial: you
will get the same differential expression result if you set all the
weights to 100, or all the weights to 0.01, or all the weights to 1.
At the normalization stage, a loess curve is fitted for each array.
Here the relative sizes of the weights for different probes are used.
Probes with lower weights for that array get less weight.
At the differential expression stage, a linear model is fitted for
each probe. Here it is the relative sizes of the weights for
different arrays for that probe which are relevant. Arrays with lower
weights get downweighted. But note, if all the weights for a given
probe are the same size, it doesn't matter what that size is. If a
particular probe gets weight 0.001 on every array, then the
differential expression analysis will be exactly the same as if that
probe got weight 1000 on every array. The weights are only used to
differentially weight the different arrays. Weights are not used to
downweight one probe relative to another in the differential
expression analysis, at least not unless the weights are actually zero.
> In our case we have four cDNA microarrays (2 biological replicates
> with 2 dye-swaps for each) in which we compare a mutant with a wt
> sample. The microarrays have each gene printed once. We use
> different spot quality filters and we apply a 0.01 weight to the
> spots which do not meet these criteria or which are controls. We
> then apply lmFit with weights and use BH to adjust for multiple
> testing. These downweighted spots do not affect the Loess print-tip
> normalization, but how are they treated in the linear model?
>
> For example, some of the genes with 3 replicate spots with weight
> 0.01 and 1 with weight 1 have an adjusted P value < 0.05. Is Limma
> "taking into consideration" that 3 of the spots are suboptimal when
> calculating the Adj. P.Vals.? Could this gene be considered fairly
> reliable because although 3 of the spots are suboptimal the fact
> that they have a similar ratio than the one which is optimal "adds
> strength" to the differential expression of that gene?
Yes, and that seems reasonable to me.
Also note that weights c(1,0.01,0.01,0.01) are will give the same
result as c(100,1,1,1). It is only the relative size which is important.
> If we set the 0.01 weights to 0.00 after the normalization, Adj.P
> Values are not calculated (as expected displays NA) for those genes
> with all four weights being 0. However, P values are calculated for
> those genes with only one weight being 1. How come there is a Pval
> in the latter if there is only 1 optimal observation?
... because the Bayesian approach supplies a standard deviation
estimate even when the data for that gene does not.
>The M values are almost identical doing it this way or leaving the
>weights at 0.01, but the Adj.P.Vals are somewhat different (Example:
>3000 genes with P<0.05 doing it either way but around 1000 genes are
>< 0.05 only by doing it one way or the other). However, no
>differences with genes over log2ratio 1 or -1 and very slight ones
>(12 genes) with genes log2ratio < 0.58 or -0.58.
>
>In summary, after obtaining the Limma results list, would you set an
>extra filter to select only those genes which have N spots meeting
>the quality criteria? Is it necessary? Which value should N be?
>Thank you very much,
I don't see why you would, and from the results you report is seems
that limma is doing reasonable things with your data already.
It seems to me that if you really do not want a particular gene to be
appearing in your gene list, even when the data for that gene seem
perfectly reasonable, then you should remove that gene from the
analysis or give the spots zero weight, rather than adding an extra
ad hoc step after the analysis.
Best wishes
Gordon
>A. Juncal
More information about the Bioconductor
mailing list