[BioC] negative correlation from limma's duplicateCorrelation

Wed Mar 13 01:17:30 CET 2013

Dear Ian,

The help page for duplicateCorrelation() says

"For this function to return statistically useful results, there must be 
at least two more arrays than the number of coefficients to be estimated, 
i.e., two more than the column rank of design."

You have just two arrays, only one more than the number of columns in the 
design matrix.  So there is not enough meaningful information from which 
to estimate the duplicate correlation.

The correlations that you are computing using cor() are not comparable to 
the quantity estimated by duplicateCorrelation.  You are computing 
correlations between pairs of spots for the same gene across different 
genes.  This correlation arises from the fact that different genes have 
different expression levels.  You would get a positive value for this 
correlation even if there were no spot duplicates.  The 
duplicateCorrelation() function on the other hand computes a correlation 
across technical replicates for the same gene.  These two correlations are 
not related.

Best wishes
Gordon

> Date: Mon, 11 Mar 2013 18:27:25 +0000
> From: Ian Sudbery <ian.sudbery at dpag.ox.ac.uk>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] negative correlation from limma's duplicateCorrelation
>
> Hi all,
>
> I might be misunderstanding something here, but I think I'm having 
> trouble with Limma's duplicateCorrelation function.
>
> I am analysing a set of custom spotted two color microarrays. Each 
> comparison contains two arrays. Each array contains two copies of each 
> spot. The design table for each comparison looks something like this:
>
>> targets
>                     SlideNumber             FileName   Cy3 Cy5       Date                 Name
> yeast_wt_v_dd_rep1           823  s823_bwp17y+ddy.txt   wty ddy 17/12/2012   yeast_wt_v_dd_rep1
> yeast_wt_v_dd_rep2           828     s828_wty+ddy.txt   wty ddy 18/01/2013   yeast_wt_v_dd_rep2
>
>
> After background correction and normalisation, I run 
> duplicateCorrelation before fitting the model:
>
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324)
>
> When I look at the results, I find that the consensus correlation is 
> negative:
>
>> corfit$consensus.correlation
> [1] -0.4163954
>
> Surely this is wrong? Examining the correlation for duplicate spots on 
> each slide manually suggests a good correlation, particularly if one 
> ignores flagged spots:
>
>> good_spots <- MA_norm$weights[1:8324,] == 1 & MA_norm$weights[8325:16648,] ==1
>> cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][good_spots[,1]])
> [1] 0.8769604
>
>> cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][good_spots[,2]])
> [1] 0.858891
>
> Even without removing the flagged spots, the correlation is still positive:
>> cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1])
> [1] 0.2669379
>
>> cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2])
> [1] 0.1843577
>
> I know that this isn't the way that duplicateCorrelation calculates rho, 
> but one would expect that a good correlation in this way might indicate 
> at least a positive correlation from duplicateCorrelation? Or am I 
> interpreting the consensus correlation incorrectly?
>
> Here is the rest of my analysis script (I've left out some diagnostic plot generation):
>
>> targets <- readTargets(row.names="Name")
>> RG <- read.maimages(targets$FileName, source = "genepix", wt.fun=wtflags(weight=0, cutoff=-50))
>> spottypes <- readSpotTypes()
>> RG$genes$Status <- controlStatus(spottypes,RG)
>> anno <- read.delim("anno.txt", comment.char="!", header = T)
>> RG_background <- backgroundCorrect(RG,method = "normexp", offset =10)
>>
>> #flag out spots where intentity isn't much above background in both channels
>> RG_background$weights[RG_background$R < 50 & RG_background$G < 50] = 0
>> MA <- normalizeWithinArrays(RG_background)
>> MA_norm <- MA[MA$genes$Status == "cDNA",]
>> MA_norm$genes$Status <- NULL
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324)
>> fit <- lmFit(MA_norm, design = design, ndups = 2, correlation=corfit$consensus.correlation,spacing = 8324)
>> fit<- eBayes(fit)
>
>> sesssionInfo()
> R version 2.15.1 (2012-06-22)
>
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] dichromat_2.0-0   gplots_2.11.0     MASS_7.3-18       KernSmooth_2.23-7 caTools_1.14      gdata_2.12.0      gtools_2.7.0
> [8] reshape2_1.2.2    ggplot2_0.9.3     statmod_1.4.16    limma_3.14.4
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5       colorspace_1.2-1   digest_0.6.2       gtable_0.1.2       labeling_0.1       munsell_0.4        plyr_1.8
> [8] proto_0.3-10       RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2      tools_2.15.1
>
> Any advice appreciated
>
> Ian
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}