[BioC] Your thoughts on Limma Array Weights?
Mark Cowley
m.cowley0 at gmail.com
Fri Jun 27 03:37:14 CEST 2008
Hi Paul and Matt,
have either of you compared situations with only small number of
arrays in a 2 group comparison, eg 2 vs 2, or 3 vs 3 and your either
throw one array away (due to QC), or just down-weight it?
I commonly throw the poor array away, but one data set that i'm
currently working on only has 2 vs 2 (read: "pilot experiment"), and
when you remove an array, then it's 2 vs 1 which is not much fun.
cheers,
Mark
On 27/06/2008, at 9:53 AM, Matt Ritchie wrote:
> Dear Paul,
>
> I have noticed cases where the results are 'better' (i.e. you get more
> extreme moderated t-statistics or log-odds) if you remove suspect
> arrays.
> In one recent example I recall, the experimenter eventually
> discovered that
> the genotype of a sample hybridised to one of their arrays was not
> what they
> originally thought. This meant that the linear model they were
> fitting was
> not right. Although the weight assigned to this array was small,
> removing
> it from the analysis altogether still produced better results. The
> array
> weights method cannot correct for these kinds of gross errors.
>
> I usually take a try it and see approach in my own analyses, similar
> to what
> you have done (i.e. run the analysis with equal weights, with array
> weights,
> or after removing any suspect arrays altogether, then look at the
> results of
> each to see which gives the most extreme statistics).
>
> Best wishes,
>
> Matt
>
>> I use limma quite a bit but have not really been using arrayWeights
>> much, until recently.
>> I like it a lot but have found, in some cases, that it appears
>> better
>> just ditch the very poorly performing arrays..and then I proceed
>> without
>> weighing .
>>
>> What are peoples real world experience with arrayWeights, are you
>> using
>> it routinely ?
>>
>> For example my typical usage... time series with biological
>> triplicates
>>
>>> design
>> t0hr t6hr t24hr t24p6hr
>> 1 1 0 0 0
>> 2 1 0 0 0
>> 3 1 0 0 0
>> 4 0 1 0 0
>> 5 0 1 0 0
>> 6 0 1 0 0
>> 7 0 0 1 0
>> 8 0 0 1 0
>> 9 0 0 1 0
>> 10 0 0 0 1
>> 11 0 0 0 1
>> 12 0 0 0 1
>>
>> arrayw<-arrayWeights(selDataMatrix,design=design)
>>> arrayw
>> 1 2 3 4 5 6 7
>> 8 9
>> 1.6473168 1.2716081 1.5170375 1.0310794 1.1010048 1.2787543 0.8198722
>> 0.7162097 2.3992850
>> 10 11 12
>> 0.1744961 1.3821469 0.6379648 ## note array 10: which was a
>> outlier in
>> hierarchical clustering (though was still more similar to arrays its
>> biological replicates than any other arrays (based on genes where
>> sd/mean> 0.1)..
>>
>> fit <- lmFit(selDataMatrix, design,weights=arrayw)
>> fit <- lmFit(selDataMatrix, design)
>>
>> cont.matrix <- makeContrasts(
>> tchange6hr="t6hr-t0hr" ,
>> tchange24hr="t24hr-t0hr" ,
>> tchange24p6hr="t24p6hr-t0hr" ,
>> diff0to6="t6hr-t0hr" ,
>> diff6to24="t24hr-t6hr" ,
>> diff24to24p6="t24p6hr-t24hr" ,
>> levels=design)
>>
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2)
>>
>> ** Get
>>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2927
>>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
>> [1] 5263
>>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2083
>>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2927
>>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2931
>>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 3810
>>
>> ####################### AS APPOSED TO THE TYPICAL:
>>
>> fit <- lmFit(selDataMatrix, design)
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2)
>>
>> ** Get
>>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1725
>>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
>> [1] 3438
>>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1512
>>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1725
>>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 1605
>>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
>> [1] 2610
>>
>> Is more differential expression better .. always... I guess so
>> unless
>> there are more false positives right? I am slightly worried that in
>> using a linear model to access array quality and produce weights ,
>> that
>> this will then naturally bias a method such as limma that then
>> using a
>> linear model, again, to determine differential expression. After
>> trying
>> a few different permutations (use weights, remove "worst" arrays and
>> redo without weights) that this is not a big concern but would
>> welcome
>> some feedback from others and insight into how they are using this
>> function .
>>
>> Thanks
>> Paul
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list