[BioC] negative correlation from limma's duplicateCorrelation
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Mar 13 01:17:30 CET 2013
Dear Ian,
The help page for duplicateCorrelation() says
"For this function to return statistically useful results, there must be
at least two more arrays than the number of coefficients to be estimated,
i.e., two more than the column rank of design."
You have just two arrays, only one more than the number of columns in the
design matrix. So there is not enough meaningful information from which
to estimate the duplicate correlation.
The correlations that you are computing using cor() are not comparable to
the quantity estimated by duplicateCorrelation. You are computing
correlations between pairs of spots for the same gene across different
genes. This correlation arises from the fact that different genes have
different expression levels. You would get a positive value for this
correlation even if there were no spot duplicates. The
duplicateCorrelation() function on the other hand computes a correlation
across technical replicates for the same gene. These two correlations are
not related.
Best wishes
Gordon
> Date: Mon, 11 Mar 2013 18:27:25 +0000
> From: Ian Sudbery <ian.sudbery at dpag.ox.ac.uk>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] negative correlation from limma's duplicateCorrelation
>
> Hi all,
>
> I might be misunderstanding something here, but I think I'm having
> trouble with Limma's duplicateCorrelation function.
>
> I am analysing a set of custom spotted two color microarrays. Each
> comparison contains two arrays. Each array contains two copies of each
> spot. The design table for each comparison looks something like this:
>
>> targets
> SlideNumber FileName Cy3 Cy5 Date Name
> yeast_wt_v_dd_rep1 823 s823_bwp17y+ddy.txt wty ddy 17/12/2012 yeast_wt_v_dd_rep1
> yeast_wt_v_dd_rep2 828 s828_wty+ddy.txt wty ddy 18/01/2013 yeast_wt_v_dd_rep2
>
>
> After background correction and normalisation, I run
> duplicateCorrelation before fitting the model:
>
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324)
>
> When I look at the results, I find that the consensus correlation is
> negative:
>
>> corfit$consensus.correlation
> [1] -0.4163954
>
> Surely this is wrong? Examining the correlation for duplicate spots on
> each slide manually suggests a good correlation, particularly if one
> ignores flagged spots:
>
>> good_spots <- MA_norm$weights[1:8324,] == 1 & MA_norm$weights[8325:16648,] ==1
>> cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][good_spots[,1]])
> [1] 0.8769604
>
>> cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][good_spots[,2]])
> [1] 0.858891
>
> Even without removing the flagged spots, the correlation is still positive:
>> cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1])
> [1] 0.2669379
>
>> cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2])
> [1] 0.1843577
>
> I know that this isn't the way that duplicateCorrelation calculates rho,
> but one would expect that a good correlation in this way might indicate
> at least a positive correlation from duplicateCorrelation? Or am I
> interpreting the consensus correlation incorrectly?
>
> Here is the rest of my analysis script (I've left out some diagnostic plot generation):
>
>> targets <- readTargets(row.names="Name")
>> RG <- read.maimages(targets$FileName, source = "genepix", wt.fun=wtflags(weight=0, cutoff=-50))
>> spottypes <- readSpotTypes()
>> RG$genes$Status <- controlStatus(spottypes,RG)
>> anno <- read.delim("anno.txt", comment.char="!", header = T)
>> RG_background <- backgroundCorrect(RG,method = "normexp", offset =10)
>>
>> #flag out spots where intentity isn't much above background in both channels
>> RG_background$weights[RG_background$R < 50 & RG_background$G < 50] = 0
>> MA <- normalizeWithinArrays(RG_background)
>> MA_norm <- MA[MA$genes$Status == "cDNA",]
>> MA_norm$genes$Status <- NULL
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324)
>> fit <- lmFit(MA_norm, design = design, ndups = 2, correlation=corfit$consensus.correlation,spacing = 8324)
>> fit<- eBayes(fit)
>
>> sesssionInfo()
> R version 2.15.1 (2012-06-22)
>
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] dichromat_2.0-0 gplots_2.11.0 MASS_7.3-18 KernSmooth_2.23-7 caTools_1.14 gdata_2.12.0 gtools_2.7.0
> [8] reshape2_1.2.2 ggplot2_0.9.3 statmod_1.4.16 limma_3.14.4
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5 colorspace_1.2-1 digest_0.6.2 gtable_0.1.2 labeling_0.1 munsell_0.4 plyr_1.8
> [8] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15.1
>
> Any advice appreciated
>
> Ian
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list