[BioC] Multifactorial edgeR GLM design question (contrast that I should make)

Wed Jan 29 15:17:48 CET 2014

Hi Zhihao,

On Tuesday, January 28, 2014 6:56:20 PM, Zhihao Tan wrote:
> Hi there,
>
> I have a question on whether some of the contrasts I am making in a
> multifactorial experiment should actually be made. I don't have a strong
> grasp of GLMs, so I might be missing something conceptually, and am hoping
> someone can advise.
>
> I am basically looking for genes that are differentially expressed in a
> certain phenotypic state (e.g. fluffy vs. smooth), but have set it up with
> 2 time-points (Day 2 and Day 5). I have trouble setting up the design using
> an equation (columns seem to disappear) so have gone ahead and created the
> design matrix using the method in 3.3.1 of the manual (pasting factors
> together). The design looks like this (I have removed replicates and many
> samples to simplify):
>
>     Day2.Fluffy Day2.Smooth Day5.Fluffy Day5.Smooth
> 1            0           1           0           0
> 7            1           0           0           0
> 13           0           0           0           1
> 16           0           0           1           0
> 19           0           0           0           1
> 35           0           0           1           0
> 36           0           0           1           0
>
> >From what I understand, the above design is set up for 2 main effects
> (phenotype and time), and if I reduce it to 1 main effect (phenotype), I
> get the design below.
>
>     Fluffy Smooth
> 1       0      1
> 7       1      0
> 13      0      1
> 16      1      0
> 19      0      1
> 35      1      0
> 36      1      0
>
> The contrast I make in the latter case is basically (Fluffy - Smooth). The
> contrast that I did for the former case, and this is what I'm unsure of, is
> ((Day2.Fluffy - Day2.Smooth) + (Day5.Fluffy - Day5.Smooth)). These tests
> are definitely not equivalent, and I get different number of sig. DE genes
> for both (more for the 2 effect design). In my mind, it makes sense,
> because the experiment *is *set up with 2 effects, and accounting for the
> biological variation in your model should allow you to be more powered to
> detect DE genes. However, I've never seen a contrast like that before. Does
> it even make sense to have an addition sign in the equation? What does that
> actually mean? Should I instead make contrasts of (Day2.Fluffy -
> Day2.Smooth) and (Day5.Fluffy - Day5.Smooth) and get the union or intersect
> of them?

The contrast you are using doesn't really make sense, because a 
contrast is usually testing the difference between groups, so you 
subtract rather than sum. If you were to use

(Day2.Fluffy - Day2.Smooth) - (Day5.Fluffy - Day5.Smooth)

then you would be testing the interaction of time and phenotype. In 
other words the interaction looks for genes that are different between 
fluffy and smooth, depending on the day. So if you think the fluffiness 
of your samples is dependent on time, that is what you would likely 
want to test.

Best,

Jim

>
> Hope someone can help on this, and thanks in advance!
>
> Regards,
> Zhihao
> Graduate Student
> University of Washington
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099