[BioC] Limma: how to combine duplicateCorrelation, dyeeffect and arrayweights?

Gordon Smyth smyth at wehi.EDU.AU
Mon Nov 26 02:14:16 CET 2007

At 06:05 AM 26/11/2007, dorthe.belgardt at medisin.uio.no wrote:
>Dear Gordon,
>thank you very much for your reply. The reason why I filtered out so many
>spots is that the overall quality of my hybridizations is not that good,
>rather "okay". I felt uncomfortable keeping spots in the analysis which I
>considered not to be reliable. So I decided to filter out weak and
>inhomogenous spots and I was told that it is not uncommon to lose so many
>datapoints by filtering.

Not in my kneck of the woods.

>I have another dataset of 2-colour-cDNA-arrays, where I applied the same
>filtering and also flagged about 70% of the spots per array. If I compare
>the topTables doing the analysis A) using spotweights and B) without using
>spotweights the overlap is close to 100%, meaning that the top100 genes I
>get using spotweights also show significant DE in the other analysis. On
>the other hand, if I use no spotweights and keep all the spots in the
>analysis process, I end up with genes in my toptables which I would
>consider to be not reliable and which received a negative flagging in
>GenePix. And as the topTables do not differ that much, I was wondering in
>what way a stringent filtering could "lead to problems down the track"?

The problem is not that your filtering is stringent but rather that 
your filtering is not based on quality criteria.

It is perfectly acceptable to filter out probes which appear not to 
be expressed in any experimental condition. This would have the 
effect that you want to achieve. But this means that you must remove 
those probes entirely from your analysis, not that you selectively 
remove some values for those probes and leave others in.

If you remove individual spots merely because they are low intensity, 
this potentially leads to a variety of problems and biases. Suppose 
your example that a particular gene is expressed in WT but absent in 
the mutant. You might miss this important result entirely because you 
filtered out all the spots in the mutant. More pervasively, this 
selective filtering introduces biases into any statistical analysis 
of the affected genes, because the filtering is based on the 
intensity value itself.

Removing all your low intensity spots prior to normalisation also 
plays havoc with loess normalisation, which expects to see the whole 
intensity range. The loess curve may now be uncertainly estimated.

>But beside that, I would be thankful for some more advice how to handle the
>duplicateCorrelation function - or better how to interpret its result.  I
>ran two microarray experiments using basicly the same design, but one
>array is printed in duplicates, the second in singlets only. My design
>matrix looks like this:
>  design
>        A  B  C  D
>  [1,]  1  0  0  0
>  [2,] -1  0  0  0
>  [3,]  1  0  0  0
>  [4,] -1  0  0  0
>  [5,]  1  0  0  0
>  [6,] -1  0  0  0
>  [7,]  0  1  0  0
>  [8,]  0 -1  0  0
>  [9,]  0  1  0  0
>[10,]  0 -1  0  0
>[11,]  0  1  0  0
>[12,]  0 -1  0  0
>[13,]  0  0  1  0
>[14,]  0  0 -1  0
>[15,]  0  0  1  0
>[16,]  0  0 -1  0
>[17,]  0  0  1  0
>[18,]  0  0 -1  0
>[19,]  0  0  0  1
>[20,]  0  0  0 -1
>[21,]  0  0  0  1
>[22,]  0  0  0 -1
>[23,]  0  0  0  1
>[24,]  0  0  0 -1
>For the arrays printed in singlets I tried to estimate the correlation for
>replicated arrays, using:
> > biolrep=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12)
>cor=duplicateCorrelation(FCmq, design=design, ndups=1, block=biolrep)
>FCmq contains the bg corrected, normalized data. The cor$consenus I get is:
>[1] 0.1396215
>According to what I read in the LimmaGuide this value should be negative
>due to the dyeswap design. What do I do if get a postive value here?
>Interpret it that way that there seems to be no correlation and treat all
>the arrays as independent and simply skipp using the dupCor function?


>Something similar happens when I use the duplicate correlation on the
>other dataset for estimating the correlation of within-array-replicates.
>Using the same designMatrix, I get
> > cor$consenus [1] 0,2723845
>The Limma guide says that this value should be greater than 0.4.

I have checked through the guide just now but cannot find such a 
statement. Where is it?

>  What do I
>do if the correlation is below 0.4?

I suppose that you use what you have.

>  And do I understand it correctly that
>the within-array-correlation is not "influenced" by the dyeswap design, so
>that I can expect to get a positive value here?



>I'd be very happy if I could get some help for these questions as well!
>Thanks a lot for your time and best regards,

More information about the Bioconductor mailing list