[BioC] of limma and superfluous arrays

Yannick Wurm Yannick.Wurm at unil.ch
Fri Feb 1 11:51:18 CET 2008


Thanks Gordon!

cheers,
yannick

On Jan 31, 2008, at 0:21 , Gordon Smyth wrote:

> Dear Yannick,
>
>  From a statistical point of view, you should include in your limma
> analysis any arrays you have which will share the same genewise
> variances as the arrays involved in your contrasts.
>
> How do you know which arrays share the same genewise variances? In
> practice, this means you should include arrays with very comparable
> RNA samples (same tissue, similar treatment), same probe set,
> collected and hybridised at the same time, i.e., arrays which really
> are part of the same greater experiment. Arrays more different than
> that should not be included.
>
> Best wishes
> Gordon
>
>> Date: Tue, 29 Jan 2008 22:35:41 +0100
>> From: Yannick Wurm <Yannick.Wurm at unil.ch>
>> Subject: [BioC] of limma and superfluous arrays
>> To: bioconductor at stat.math.ethz.ch
>>
>> Dear List,
>>
>> I'm starting to do limma analyses on a small timecourse loop design
>> with 2-color cDNA chips as follows:
>>         0h vs 6h
>>         6h vs 24h
>>         24h vs 0h
>> Four biological replicates -> and then four biological replicates dye
>> balanced <-
>>
>> My targets file begins like this (only the first two sets of three
>> listed):
>>         US22502600_F82_S01.gpr  A_0h    A_24h
>>         US22502600_F65_S01.gpr  A_24h   A_6h
>>         US22502600_F153_S01.gpr A_6h    A_0h
>>         US22502600_F85_S01.gpr  F_0h    F_6h
>>         US22502600_F60_S01.gpr  F_24h   F_0h
>>         US22502600_F72_S01.gpr  F_6h    F_24h
>>         ... with eight such sets of three.
>>
>> But then I also have some chips -> against our labs "standard"
>> reference RNA:
>>         US22502600_F67_S01.gpr  A_24h   Ref
>>         US22502600_F83_S01.gpr  F_24h   Ref
>>         ... and six more
>>
>> For my limma analysis, I have three options:
>>         *a*: use only the minimal number of chips (ie each loop of  
>> three,
>> and nothing to connect the loops). In this case, limma is unable to
>> estimate one parameter in each small loop (eg the 6h timepoint). I
>> can ask how many genes are differentially expressed between 24h  
>> and 0h:
>>> design.noref = modelMatrix(targets.noref, ref="A_0h")
>>> fit.noref = lmFit(MA.noref.p, design.noref)
>>> cont.matrix= makeContrasts(T24_T0 =
>> (A_24h+C_24h+F_24h+K_24h+N_24h
>> +Q_24h+R_24h+T_24h -C_0h-F_0h-K_0h-N_0h-Q_0h-R_0h-T_0h)/8,
>> levels=design.noref)
>>> fit.noref2= contrasts.fit(fit.noref, cont.matrix)
>>> fit.noref2=eBayes(fit.noref2)
>>> summary(topTable(fit.noref2,n=10000)$adj.P.Val<=0.05)
>>
>>         ---> I get 3668 differentially expressed spots.
>>
>>         *b*: provide my "24h" vs Ref chips as well
>>                 using ref="Ref" in my design and
>>> cont.matrix= makeContrasts(T24_T0 =
>> (A_24h+C_24h+F_24h+K_24h+N_24h
>> +Q_24h+R_24h+T_24h -A_0h-C_0h-F_0h-K_0h-N_0h-Q_0h-R_0h-T_0h)/8,
>> levels=design)
>>
>>         ---> I get 3796 differentially expressed spots.
>>
>>
>>         *c*: use those in *b*, as well as eight additional chips  
>> done in
>> parallel, that are XXX vs Ref. The XXX samples don't connect to
>> anything other than Ref (they're superfluous).
>>
>>         ---> I get 3583 differentially expressed spots.
>>
>> Searching the archives, several posts mentioned that providing more
>> chips gives limma a better estimation of variance. Thus it makes
>> sense to provide more. And doing so finds more differentially
>> expressed genes in *b* than in *a*.
>> But so would it be defendable to input all the chips I did in that
>> batch to limma? All the chips I've ever done?
>>
>> And then I get a smaller number of differentially expressed spots in
>> *c* than in *b*. Which surprises me, because using more chips should
>> make my estimation of variance more precise. Comparing *b* with *c*
>> leads me to conclude that the chips I've added to the analysis in *c*
>> are funky because they increase estimates of variance, or that the
>> chips in *b* show artificially low variance.
>>
>> Does this make sense?
>> Obviously, in this analysis my numbers of differentially expressed
>> genes are quite similar in these three cases, and 5% more or less
>> significant spots probably won't make a difference. But it would be
>> good to know what is most valid for future analyses as well.
>>
>>
>> Thanks and regards,
>>
>> yannick
>>
>>
>>
>> --------------------------------------------
>>           yannick . wurm @ unil . ch
>> Ant Genomics, Ecology & Evolution @ Lausanne
>>    http://www.unil.ch/dee/page28685_fr.html
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list