[BioC] GLM design matrix not symmetric

Gordon K Smyth smyth at wehi.EDU.AU
Sat Jan 25 09:15:27 CET 2014


Hi Michael,

As a postscript, there is a way to recover at least partial information 
from the day1 WT sample, even when the day1 KO sample has been lost. 
This requires a random effect or correlation approach instead of a paired 
t-test type analysis.  If the litter effect is not very strong, this can 
be a useful approach.  To do such an analysis, you would need to switch to 
a voom-limma analysis pipeline and use the duplicateCorrelation function 
of the limma package.

Best wishes
Gordon


On Sat, 25 Jan 2014, Gordon K Smyth wrote:

> Dear Michael,
>
> Intuitively, I'm sure you will appreciate that, if you lose one member of a 
> matched pair in a paired analysis, then it becomes impossible to make any 
> comparisons using the remaining member of that pair.
>
> From a mathematical point of view, edgeR has no requirement for there to be 
> equal numbers of samples in the different genotype groups, so the analysis 
> approach does the right thing and remains valid.  In the simple example you 
> give, edgeR will in effect remove the first sample from the analysis.  So you 
> will get identical KO vs WT DE results from either:
>
>  day <- factor(c(1,2,3,2,3))
>  condition <- factor(c("WT","WT","WT","KO","KO"),levels=c("WT","KO"))
>  design <- model.matrix(~day+condition)
>
> or
>
>  day <- factor(c(2,3,2,3))
>  condition <- factor(c("WT","WT","KO","KO"),levels=c("WT","KO"))
>  design <- model.matrix(~day+condition)
>
> with the day1 WT sample removed.
>
> Best wishes
> Gordon
>
>> Date: Fri, 24 Jan 2014 01:18:32 +0000
>> From: Michael Moore <mmoore at mail.rockefeller.edu>
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: [BioC] GLM design matrix not symmetric
>> 
>> Hello,
>> 
>> I have a question about the validity of using the GLM strategy in edgeR if 
>> the design matrix is not symmetric.
>> 
>> I am dealing with RNA-seq data from cultured T-cells from mice that are WT 
>> or KO'd for my gene of interest. In addition to the WT vs. KO variable, 
>> there are also "batch" effects with this data such that the profiles 
>> cluster by genotype (as expected), but also by litter of mice. Typically, I 
>> have 3 biological replicates for each genotype, all collected on different 
>> days (i.e. from different litters). To deal with this, I have been using 
>> the GLM method in edgeR with the following design matrix:
>> 
>> day <- factor(c(1,2,3,1,2,3))
>> condition <- factor(c("WT", "WT", "WT", "KO", "KO", "KO"), levels=c("WT", 
>> "KO"))
>> design <- model.matrix(~day+condition)
>> 
>> The DE analysis with this method was more sensitive in detecting 
>> differences due to genotype vs. the "classic" exactTest method.
>> 
>> My problem arises in a new experiment where one of the KO replicates 
>> failed, but the matched WT was fine. The corresponding design matrix is:
>> 
>> day <- factor(c(1,2,3,2,3))
>> condition <- factor(c("WT", "WT", "WT", "KO", "KO"), levels=c("WT", "KO"))
>> design <- model.matrix(~day+condition)
>> 
>> Is such a design valid?
>> 
>> Thanks very much for your time.
>> 
>> Michael
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list