[BioC] GLM design matrix not symmetric

Gordon K Smyth smyth at wehi.EDU.AU
Sat Jan 25 09:00:49 CET 2014

Dear Michael,

Intuitively, I'm sure you will appreciate that, if you lose one member of 
a matched pair in a paired analysis, then it becomes impossible to make 
any comparisons using the remaining member of that pair.

>From a mathematical point of view, edgeR has no requirement for there to 
be equal numbers of samples in the different genotype groups, so the 
analysis approach does the right thing and remains valid.  In the simple 
example you give, edgeR will in effect remove the first sample from the 
analysis.  So you will get identical KO vs WT DE results from either:

   day <- factor(c(1,2,3,2,3))
   condition <- factor(c("WT","WT","WT","KO","KO"),levels=c("WT","KO"))
   design <- model.matrix(~day+condition)


   day <- factor(c(2,3,2,3))
   condition <- factor(c("WT","WT","KO","KO"),levels=c("WT","KO"))
   design <- model.matrix(~day+condition)

with the day1 WT sample removed.

Best wishes

> Date: Fri, 24 Jan 2014 01:18:32 +0000
> From: Michael Moore <mmoore at mail.rockefeller.edu>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] GLM design matrix not symmetric
> Hello,
> I have a question about the validity of using the GLM strategy in edgeR 
> if the design matrix is not symmetric.
> I am dealing with RNA-seq data from cultured T-cells from mice that are 
> WT or KO'd for my gene of interest. In addition to the WT vs. KO 
> variable, there are also "batch" effects with this data such that the 
> profiles cluster by genotype (as expected), but also by litter of mice. 
> Typically, I have 3 biological replicates for each genotype, all 
> collected on different days (i.e. from different litters). To deal with 
> this, I have been using the GLM method in edgeR with the following 
> design matrix:
> day <- factor(c(1,2,3,1,2,3))
> condition <- factor(c("WT", "WT", "WT", "KO", "KO", "KO"), levels=c("WT", "KO"))
> design <- model.matrix(~day+condition)
> The DE analysis with this method was more sensitive in detecting 
> differences due to genotype vs. the "classic" exactTest method.
> My problem arises in a new experiment where one of the KO replicates 
> failed, but the matched WT was fine. The corresponding design matrix is:
> day <- factor(c(1,2,3,2,3))
> condition <- factor(c("WT", "WT", "WT", "KO", "KO"), levels=c("WT", "KO"))
> design <- model.matrix(~day+condition)
> Is such a design valid?
> Thanks very much for your time.
> Michael

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list