[BioC] Limma: how to combine duplicateCorrelation, dyeeffect and arrayweights?

Fri Nov 23 04:24:53 CET 2007

Dear Dorthe,

The arrayWeights() function does allow you to combine array weights 
with spot weights. You are expected to combine the weight analysis 
with the most correct design matrix, so including a dye-swap effect 
in the design matrix is perfectly appropriate. So, no problems so far.

On the other hand, any spot weight system which removes 70% of your 
data is bound to lead to problems down the track, whatever you do. 
Why do you feel you need to do this?

Most of the spots which are flagged by GenePix are flagged simply 
because they are faint, rather than because of any other quality 
considerations. Since you are using normexp, which prevents low 
intensity spots giving over-variable log-ratios, there is no good 
reason for you to filter out these spots. In most case it would be 
better to simply ignore the GenePix flags, and keep all your spots in 
the analysis.

If you feel uncomfortable with this, you could check the GenePix 
flags to see if any spots are flagged for reasons other than 
faintness (Flag < -50). But please don't filter out spots just 
because they are faint. This goes against all Bioconductor recommendations.

Best wishes
Gordon

>Date: Wed, 7 Nov 2007 17:21:09 +0100 (CET)
>From: dorthe.belgardt at medisin.uio.no
>Subject: [BioC] Limma: how to combine duplicateCorrelation, dyeeffect
>         and arrayweights?
>To: bioconductor at stat.math.ethz.ch
>
>Hi,
>
>I am quite insecure if some parts of the analyis I did in Limma are really
>correct and I would highly appreciate if someone could take a look and
>give advice. My main concern is that I may not use the
>duplicatecorrelation, dyeeffect,arrayweights and spotweights correctly.
>
>The arrays I use are printed in duplicates with a spacing of 15000 (so
>30000 features in total), and I did the imageprocessing in GenePixPro6.1.
>Thereby I flagged all spots close to backgroundsignal and with a rgn r2
><0.5 bad, and only 30% of my data remain unflagged.
>
>And this is what I did using Limma:
>
> > targets=readTargets("Targets_basicSat.txt")
> > targets
>    SlideNumber          FileName Cy3 Cy5
>1            1 3096_basicSat.gpr ref   A
>2            2 3079_basicSat.gpr   A ref
>3            3 3089_basicSat.gpr ref   A
>4            4 3081_basicSat.gpr   A ref
>5            5 3071_basicSat.gpr ref   B
>6            6 3082_basicSat.gpr   B ref
>7            7 3085_basicSat.gpr ref   B
>8            8 8268_basicSat.gpr   B ref
>9            9 7829_basicSat.gpr ref   C
>10          10 3086_basicSat.gpr   C ref
>11          11 7823_basicSat.gpr ref   C
>12          12 7826_basicSat.gpr   C ref
>13          13 3090_basicSat.gpr ref   D
>14          14 3091_basicSat.gpr   D ref
>15          15 3092_basicSat.gpr ref   D
>16          16 7827_basicSat.gpr   D ref
>
>Every other slide is a dyeswapped technical replicate and per "group"
>(A,B,C,D) there are 2 biological replicates.
>
> > K=read.maimages(targets$FileName, source="genepix.median",
>wt.fun=wtflags(0))
> > types=readSpotTypes("SpottypesGAPDH.txt")
> > Status=controlStatus(types, K)
> > K$genes$Status=Status
> > K3=backgroundCorrect(K, method=?normexp?, offset=50)
> > K3=normalizeWithinArrays(K3, method="median")
> > K3a=normalizeBetweenArrays(K3, method="quantile")
> > design=modelMatrix(targets, ref="ref")
> > design
>       A  B  C  D
>  [1,]  1  0  0  0
>  [2,] -1  0  0  0
>  [3,]  1  0  0  0
>  [4,] -1  0  0  0
>  [5,]  0  1  0  0
>  [6,]  0 -1  0  0
>  [7,]  0  1  0  0
>  [8,]  0 -1  0  0
>  [9,]  0  0  1  0
>[10,]  0  0 -1  0
>[11,]  0  0  1  0
>[12,]  0  0 -1  0
>[13,]  0  0  0  1
>[14,]  0  0  0 -1
>[15,]  0  0  0  1
>[16,]  0  0  0 -1
>
>Since I am expecting a non-negligible dyeeffect I created an other
>designmatrix and the following contrastMatrix:
>
> >design1=cbind(DyeEffect=1, design)
> >design.cont=makeContrasts("A", ?B?, ?A-B", levels=design1)
>
>Next I estimate the correlation of within-array-duplicates:
>
> >cor=duplicateCorrelation(K3b, design=design1, ndups=2, spacing=15000,
>weights=K3b$weights)
>
>My first question is: is it correct to use here the designmatrix for the
>dyeeffect (design1 in this case)?
>
>When fitting the linear model, I also want to use arrayweights, combined
>with spotweights. So I gave following commands:
>
> > aw=arrayWeights(K3b, design=design1)
> > w=matvec(K3b$weights, aw)
>
>Again the question: is it correct to use here the "design1"-matrix
>considering the dyeeffect?
>
>Then I fit the linear model:
>
> >fit=lmFit(K3b, design=design1, ndups=2, spacing=15000, cor=cor$consensus,
>weights=w)
> >fit1=contrasts.fit(fit, design.cont)
> >eb=eBayes(fit1)
>
>Another thing I am worried about is that taking into account the dyeeffect
>plus arrayweights plus spotweights might be a bit "too much"? Like in a
>way "overtransforming" my data? Especially since approx 70% of my data
>have a spotweight of zero. Might it be better to use the spotweight of 0,1
>for bad spots, so that I do not loose the data completely?
>
>My apologies for this long email, I tried hard to find out the answers for
>myself reading the limmaguide and lots of other documents I found
>googleing, but still feel quite "stuck" in my analysis process.
>
>Thanks very much for any kind of help in advance!
>Best regards
>Dorthe
>
>
>--
>Dorthe Belgardt
>Institute of Basic Medical Sciences
>Department of Physiology
>P.O. Box 1103 Blindern
>0317 Oslo
>Norway