[BioC] GLM design matrix not symmetric
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Jan 25 09:15:27 CET 2014
Hi Michael,
As a postscript, there is a way to recover at least partial information
from the day1 WT sample, even when the day1 KO sample has been lost.
This requires a random effect or correlation approach instead of a paired
t-test type analysis. If the litter effect is not very strong, this can
be a useful approach. To do such an analysis, you would need to switch to
a voom-limma analysis pipeline and use the duplicateCorrelation function
of the limma package.
Best wishes
Gordon
On Sat, 25 Jan 2014, Gordon K Smyth wrote:
> Dear Michael,
>
> Intuitively, I'm sure you will appreciate that, if you lose one member of a
> matched pair in a paired analysis, then it becomes impossible to make any
> comparisons using the remaining member of that pair.
>
> From a mathematical point of view, edgeR has no requirement for there to be
> equal numbers of samples in the different genotype groups, so the analysis
> approach does the right thing and remains valid. In the simple example you
> give, edgeR will in effect remove the first sample from the analysis. So you
> will get identical KO vs WT DE results from either:
>
> day <- factor(c(1,2,3,2,3))
> condition <- factor(c("WT","WT","WT","KO","KO"),levels=c("WT","KO"))
> design <- model.matrix(~day+condition)
>
> or
>
> day <- factor(c(2,3,2,3))
> condition <- factor(c("WT","WT","KO","KO"),levels=c("WT","KO"))
> design <- model.matrix(~day+condition)
>
> with the day1 WT sample removed.
>
> Best wishes
> Gordon
>
>> Date: Fri, 24 Jan 2014 01:18:32 +0000
>> From: Michael Moore <mmoore at mail.rockefeller.edu>
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: [BioC] GLM design matrix not symmetric
>>
>> Hello,
>>
>> I have a question about the validity of using the GLM strategy in edgeR if
>> the design matrix is not symmetric.
>>
>> I am dealing with RNA-seq data from cultured T-cells from mice that are WT
>> or KO'd for my gene of interest. In addition to the WT vs. KO variable,
>> there are also "batch" effects with this data such that the profiles
>> cluster by genotype (as expected), but also by litter of mice. Typically, I
>> have 3 biological replicates for each genotype, all collected on different
>> days (i.e. from different litters). To deal with this, I have been using
>> the GLM method in edgeR with the following design matrix:
>>
>> day <- factor(c(1,2,3,1,2,3))
>> condition <- factor(c("WT", "WT", "WT", "KO", "KO", "KO"), levels=c("WT",
>> "KO"))
>> design <- model.matrix(~day+condition)
>>
>> The DE analysis with this method was more sensitive in detecting
>> differences due to genotype vs. the "classic" exactTest method.
>>
>> My problem arises in a new experiment where one of the KO replicates
>> failed, but the matched WT was fine. The corresponding design matrix is:
>>
>> day <- factor(c(1,2,3,2,3))
>> condition <- factor(c("WT", "WT", "WT", "KO", "KO"), levels=c("WT", "KO"))
>> design <- model.matrix(~day+condition)
>>
>> Is such a design valid?
>>
>> Thanks very much for your time.
>>
>> Michael
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list