[BioC] limma Question: Design matrix for imperfectly paired samples

Sat May 12 09:23:13 CEST 2007

>Message: 21
>Date: Thu, 10 May 2007 14:10:58 -0400
>From: Xia Han <xia.han at bms.com>
>Subject: [BioC] limma Question: Design matrix for imperfectly paired
>         samples
>To: bioconductor at stat.math.ethz.ch
>Message-ID: <46436032.5070102 at bms.com>
>Content-Type: text/plain
>
>Dear List:
>
>I'm using limma package on an affy dataset. I have pre- and post-
>treatment biopsies, and some biopsies came from different regions of
>tumors and were profiled more than once, e.g. sample 01_01_Post (so they
>should be technical replicates?). There are some samples missing pre- or
>post- counterparts. The phenoData likes this:
>
>         Subject         Treatment       Subject.Treat   Pairs
>1       01_01   Pre     01_01_Pre       1
>2       01_01   Post    01_01_Post      1
>3       01_01   Post    01_01_Post      1
>4       01_02   Pre     01_02_Pre       NA
>5       02_01   Pre     02_01_Pre       2
>6       02_01   Post    02_01_Post      2
>7       02_02   Pre     02_02_Pre       3
>8       02_02   Post    02_02_Post      3
>9       02_03   Pre     02_03_Pre       NA
>10      03_01   Pre     03_01_Pre       4
>11      03_01   Post    03_01_Post      4
>12      03_02   Pre     03_02_Pre       5
>13      03_02   Pre     03_02_Pre       5
>14      03_02   Post    03_02_Post      5
>15      03_03   Post    03_03_Post      NA
>16      03_04   Pre     03_04_Pre       6
>17      03_04   Pre     03_04_Pre       6
>18      03_04   Post    03_04_Post      6
>19      03_05   Pre     03_05_Pre       7
>20      03_05   Post    03_05_Post      7
>
>
>The purpose is to detect treatment effects with more statistical power.
>I have 2 ways of building matrix as below:
>/==================================
>#First possibility:
>//design <- model.matrix(Treatment)
>corfit <- duplicateCorrelation(data, design=design, ndups=1, block=Subject)
>fit <- lmFit(data, design, block=Subject, cor=corfit$consensus)
>fit <- eBayes(fit)
>topTable(fit, coef=2)///

What's wrong with this? Apparently you have abandoned this approach, 
but you don't say why.

>==========================================
>I was trying to use block=Subject.Treat for corfit, but got an error
>message of " Too much damping -convergence tolerance not achievable in:
>glmgam.fit(.....)". I can only run with block=Subject for corfit. Can
>somebody explain this to me?
>==========================================

Firstly, using Subject.Treat as the blocking variable is incorrect, 
because Treat is the treatment of interest and must be put in the 
design matrix not used as part of the blocking.

Secondly, there are errors and there are warnings. What you have is a 
warning, not an error. An error is when R says "Error" and the 
execution stops. A warning is when R says "Warning" and the execution 
continues without interruption.

The fact that this warning message is not a reason for concern is 
explained on the help page for duplicateCorrelation(). Read it! A 
search through the Bioconductor mailing list archive for "Too much 
damping", would also reveal a discussion telling you the same thing.

//#Second possibility: //
>phenoData.pairs <- subset(phenoData, Pairs!="NA")
>data.pairs <- data[ ,row.names(phenoData.pairs)]
>design <- model.matrix(~phenoData.pairs$Pairs + phenoData.pairs$Treatment)
>fit <- lmFit(data.pairs, design)
>fit <- eBayes(fit)
>topTable(fit, coef=3)/
>=========================================
>The above code would ignore data from non-paired samples. Does it
>consider the technical replicates for some of the samples?

Yes, it does use the technical replicate data. How could it not, 
since the technical replicates are included in your dataset? From a 
hypothesis testing point of view, this approach is a somewhat 
over-optimistic, because you would be testing against technical variation only.

Best wishes
Gordon

>Thanks,
>Xia