[BioC] Limma design: paired samples, batch effect

Sat Apr 4 05:56:50 CEST 2009

Dear John,

You can't expect to be able to estimate more things than your experiment 
contains information about.  In this case, your batches are entirely 
confounded with patients+treatment, and hence you can't estimate 
coefficients for them.  That is what the message from limma is telling 
you.

BTW, you got a diagnostic message and a warning, not an "error" message.

Best wishes
Gordon

> Date: Thu, 2 Apr 2009 15:22:36 +0300
> From: John Burger <clinicalgenomics at gmail.com>
> Subject: [BioC] Limma design: paired samples, batch effect
> To: bioconductor at stat.math.ethz.ch
> Content-Type: text/plain
>
> Hello,
>
> This is yet again a question about limma design matrices. I would like to
> perform analysis on single channel arrays (Affy). The experimental design
> is: paired samples before and after treatment, and I would also like to take
> into account batch effect.
>
> I am having problems with the batch effect, by reading through mails in this
> mailing list, I've gotten the impression that you can include batch to your
> design matrix, but when I try to do that, I get an error message. Here is a
> simulated example of what I am trying to do:
>
> Create example date, patient (5 patients, 2 samples), treatment (A or B),
> and batch (three batches, 1-3)
>
>> patient <- factor(rep(1:5,2))
>> patient
> [1] 1 2 3 4 5 1 2 3 4 5
> Levels: 1 2 3 4 5
>> treatment <- factor(c(rep("A",5), rep("B",5)))
>> treatment
> [1] A A A A A B B B B B
> Levels: A B
>> batch <- factor(c(rep(1,3), rep(2,3), rep(3,4)))
>> batch
> [1] 1 1 1 2 2 2 3 3 3 3
> Levels: 1 2 3
>
> If I would like to perform a basic analysis between the paired samples and
> not taking into account batch effect, I could just do:
>
>> design <- model.matrix(~patient+treatment)
>> design
>   (Intercept) patient2 patient3 patient4 patient5 treatmentB
> 1            1        0        0        0        0          0
> 2            1        1        0        0        0          0
> 3            1        0        1        0        0          0
> 4            1        0        0        1        0          0
> 5            1        0        0        0        1          0
> 6            1        0        0        0        0          1
> 7            1        1        0        0        0          1
> 8            1        0        1        0        0          1
> 9            1        0        0        1        0          1
> 10           1        0        0        0        1          1
>
> And then use treatmentB to obtain the differences between treatments A and
> B.
>
> But, how could I include batch effect to the model? My inital thought was:
>
>> design <- model.matrix(~patient+treatment+batch)
>> design
>   (Intercept) patient2 patient3 patient4 patient5 treatmentB batch2 batch3
> 1            1        0        0        0        0          0      0      0
> 2            1        1        0        0        0          0      0      0
> 3            1        0        1        0        0          0      0      0
> 4            1        0        0        1        0          0      1      0
> 5            1        0        0        0        1          0      1      0
> 6            1        0        0        0        0          1      1      0
> 7            1        1        0        0        0          1      0      1
> 8            1        0        1        0        0          1      0      1
> 9            1        0        0        1        0          1      0      1
> 10           1        0        0        0        1          1      0      1
>
> But when I do that with actual data, I get the following error message:
>
> Coefficients not estimable: batch2 batch3
> Warning message:
> Partial NA coefficients for 10163 probe(s)
>
> Any ideas what I am doing wrong, and more importantly, what I should do to
> take that batch effect into account?
>
> Thanks,
> John