[BioC] duplicateCorrelation

Fri Nov 18 05:54:55 CET 2005

Dear Devin,

There are a couple of problems. Firstly, you've told us that your 
replicates are 112 spots apart, but you haven't told limma this. So the 
software is assuming that the replicates are side-by-side, which is the 
default. You need instead:

 > cor <- duplicateCorrelation(MA, design, ndups=3, spacing=112)

Secondly, two arrays is pretty minimal to estimate duplicate correlations. 
The help page for duplicateCorrelation says:

      For this function to return statistically useful results, there
      must be at least two more arrays than the number of coefficients
      to be estimated, i.e., two more than the column rank of 'design'.

Hence you need at least 3 arrays to have confidence in your results whereas 
you have only two.

If you want to check that duplicateCorrelation() is getting the right 
input, the best way is to check that your replicates really are at the 
spacing you think they are. Your data files (ScanArray?) almost certainly 
contain a gene ID column. Let's assume this column is called "ID". Use

 > RG <- read.maimages(...,   annotation="ID")

Then

 > unwrapdups(MA$genes$ID, ndups=3, spacing=112)

is a matrix which should have three identical columns. Does it?

Best wishes
Gordon

>[BioC] duplicateCorrelation
>Devin Scannell scannedr at tcd.ie
>Fri Nov 18 02:03:07 CET 2005
>
>Hi,
>
>this is not a very interesting question but it has given me enough
>trouble to get me to mail the list so I hope somebody has time to
>reply.
>
>I have several two-colour arrays to analyze. Each probe is present
>three times on each chip and they are spaced 112 spots apart (not my
>decision). The consensus correlation returned by  duplicateCorrelation
>is typically around zero which is surprising since the spots are close
>together and the data looks good in MA plots (even before
>normalization). A histogram of the individual correlations
>(cor$all.correlations from duplicateCorrelation) supports the
>conclusion that the within-chip replicates are poorly correlated.
>
>I am concerned that the numbers that are being handed to
>duplicateCorrelation are incorrect somehow but I am not sure what I am
>doing wrong (code below). I have looked at the code for
>duplicateCorrelation and cannot follow it so I was wondering if anyone
>can suggest a way to verify the correlations it is calculating. Ideally
>I would like to be able to select a specific gene, calculate the
>correlation between replicates myself and verify that this is the same
>as I obtain from duplicateCorrelation.
>
>Thanks in advance,
>Devin

>library(limma)
>
>targets <- readTargets()
>
>targets
>     SlideNumber     Name FileName  Cy3  Cy5
>13          13 60H_9:12   13.csv  WT1 60H1
>17          17 60H_12:9   17.csv 60H1  WT1
>
>flag.check <- function(x) as.numeric(x$Flags >= 3)
>RG <- read.maimages(targets$FileName, sep=",", columns=list(Rf="Ch1
>Median",Gf="Ch2 Median",Rb="Ch1 B Median",Gb="Ch2 B Median"),
>wt.fun=flag.check)
>
>RG$genes <- readGAL()
>RG$printer <- getLayout(RG$genes)
>
>RG.bgc <- backgroundCorrect(RG, method="normexp", offset=50)
>MA <- normalizeWithinArrays(RG.bgc, method="loess")
>
>design <- cbind(c(1,-1))
>cor <- duplicateCorrelation(MA, design, ndups=3)