[BioC] Your thoughts on Limma Array Weights?

Fri Jun 27 03:37:14 CEST 2008

Hi Paul and Matt,
have either of you compared situations with only small number of  
arrays in a 2 group comparison, eg 2 vs 2, or 3 vs 3 and your either  
throw one array away (due to QC), or just down-weight it?
I commonly throw the poor array away, but one data set that i'm  
currently working on only has 2 vs 2 (read: "pilot experiment"), and  
when you remove an array, then it's 2 vs 1 which is not much fun.

cheers,
Mark

On 27/06/2008, at 9:53 AM, Matt Ritchie wrote:

> Dear Paul,
>
> I have noticed cases where the results are 'better' (i.e. you get more
> extreme moderated t-statistics or log-odds) if you remove suspect  
> arrays.
> In one recent example I recall, the experimenter eventually  
> discovered that
> the genotype of a sample hybridised to one of their arrays was not  
> what they
> originally thought.  This meant that the linear model they were  
> fitting was
> not right.  Although the weight assigned to this array was small,  
> removing
> it from the analysis altogether still produced better results.  The  
> array
> weights method cannot correct for these kinds of gross errors.
>
> I usually take a try it and see approach in my own analyses, similar  
> to what
> you have done (i.e. run the analysis with equal weights, with array  
> weights,
> or after removing any suspect arrays altogether, then look at the  
> results of
> each to see which gives the most extreme statistics).
>
> Best wishes,
>
> Matt
>
>> I use limma quite a bit but have not really been using arrayWeights
>> much, until recently.
>> I like it a lot but have  found, in some cases, that it appears   
>> better
>> just ditch the very poorly performing arrays..and then I proceed  
>> without
>> weighing .
>>
>> What are peoples real world experience with arrayWeights, are you  
>> using
>> it routinely ?
>>
>> For example my typical usage... time series with biological  
>> triplicates
>>
>>> design
>>   t0hr t6hr t24hr t24p6hr
>> 1     1    0     0       0
>> 2     1    0     0       0
>> 3     1    0     0       0
>> 4     0    1     0       0
>> 5     0    1     0       0
>> 6     0    1     0       0
>> 7     0    0     1       0
>> 8     0    0     1       0
>> 9     0    0     1       0
>> 10    0    0     0       1
>> 11    0    0     0       1
>> 12    0    0     0       1
>>
>> arrayw<-arrayWeights(selDataMatrix,design=design)
>>> arrayw
>>        1         2         3         4         5         6         7
>> 8         9
>> 1.6473168 1.2716081 1.5170375 1.0310794 1.1010048 1.2787543 0.8198722
>> 0.7162097 2.3992850
>>       10        11        12
>> 0.1744961 1.3821469 0.6379648  ## note array 10:  which was a  
>> outlier in
>> hierarchical clustering (though was still more similar to arrays its
>> biological replicates than any other arrays (based on genes where
>> sd/mean> 0.1)..
>>
>> fit <- lmFit(selDataMatrix, design,weights=arrayw)
>> fit <- lmFit(selDataMatrix, design)
>>
>> cont.matrix <- makeContrasts(
>> tchange6hr="t6hr-t0hr" ,
>> tchange24hr="t24hr-t0hr" ,
>> tchange24p6hr="t24p6hr-t0hr" ,
>> diff0to6="t6hr-t0hr" ,
>> diff6to24="t24hr-t6hr" ,
>> diff24to24p6="t24p6hr-t24hr" ,
>> levels=design)
>>
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2)
>>
>> ** Get
>>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2927
>>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
>> [1] 5263
>>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2083
>>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2927
>>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2931
>>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 3810
>>
>> ####################### AS APPOSED TO THE TYPICAL:
>>
>> fit <- lmFit(selDataMatrix, design)
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2)
>>
>> ** Get
>>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1725
>>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
>> [1] 3438
>>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1512
>>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1725
>>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1605
>>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2610
>>
>> Is more differential expression  better .. always... I guess so  
>> unless
>> there are more false positives right?  I am slightly worried that in
>> using a linear model to access array quality and produce weights ,  
>> that
>> this will then naturally bias a method such as limma that then   
>> using a
>> linear model, again, to determine differential expression. After  
>> trying
>> a few different permutations (use weights, remove "worst" arrays and
>> redo without weights) that this is not a big concern but would  
>> welcome
>> some feedback from others and insight into how they are using this
>> function .
>>
>> Thanks
>> Paul
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor