[BioC] [Bioc-sig-seq] interaction factor in edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Wed May 11 06:52:19 CEST 2011
Dear Biase,
Your questions are really general questions about two way models, rather
than specifically to do with edgeR. I'll try to give some general advice,
but ultimately it depends on your own scientific questions.
First point, if you fit an interaction model, it doesn't usually make
sense to test for a main effect (like the term 'a' in your model below),
at least not unless you really know what you're doing. The interaction
model implies that the time effect depends on the type of embryo, so there
isn't a single unambiguous time effect to test for.
If you really want to use embryo type as a blocking variable, you need to
remove the interaction. In that case, factor 'a' would be interpreted as
a time effect that is consistent across the two embryo types.
If instead you want to test for separate time effects in normal and cloned
embryos, then the best way to do that would be to treat your experiment as
having one factor with four levels: NormalTime1, NormalTime2, ClonedTime1,
ClonedTime2. Then you could test for a time effect for cloned embryos
from the pairwise comparison
ClonedTime2-ClonedTime1
and so on for other comparisons. Most biologists find this to be a more
self-explanatory way to proceed that using the model formulas. You can
easily do with the "classic" edgeR approach, i.e., you don't need the GLM
functions.
Best wishes
Gordon
On Tue, 10 May 2011, Biase, Fernando wrote:
> Dear Prof Smyth,
>
> in
>
> design <- model.matrix(~ a + b + a:b , data=targets)
>
> my interest is in factor a (coef=2).
>
> "Do you expect the effect of experimental factor b to be same for each
> level of a? If yes, then maybe you don't need the interaction term.
> It depends on your experiment and on the questions you want to ask."
>
> I am not sure, but I guess the answer is no. The experiment consists of
> embryos collected at two time points (factor a), normal or cloned
> embryos (factor b). And on top of it, it is an unbalanced sample. I have
> previously tested the hypothesis of whether cloning affects the gene
> expression, for which I do not need the first factor (a). I am using the
> factor b as a block to test the hypothesis of whether the expression is
> different between time points (factor a).
>
> Please, let me know if you think otherwise.
>
> thanks for the reply,
>
> Fernando
>
> ________________________________________
> From: Gordon K Smyth [smyth at wehi.EDU.AU]
> Sent: Tuesday, May 10, 2011 6:53 PM
> To: Biase, Fernando
> Cc: bioc-sig-sequencing at r-project.org
> Subject: [Bioc-sig-seq] interaction factor in edgeR
>
> Dear Fernando,
>
>> Date: Tue, 10 May 2011 13:40:23 -0500
>> From: "Biase, Fernando" <biase at illinois.edu>
>> To: "bioc-sig-sequencing at r-project.org"
>> <bioc-sig-sequencing at r-project.org>
>> Subject: [Bioc-sig-seq] interaction factor in edgeR
>>
>> Dear list users,
>>
>> I am not a statistician, so pardon my ignorance.
>>
>> When using edgeR package to analyse RNA-seq data the number of
>> differential expressed genes vary depending on whether I use an
>> interaction factor in the design. Can anyone suggest why does it happen?
>
> Well, you fit a different model, and test a different hypothesis, so the
> results change. No doubt the residual dispersion has changed as well.
> Wouldn't you be worried if the results didn't change?
>
>> Example:
>>
>> if I use:
>> design <- model.matrix(~ a + b , data=targets)
>>
>> I have:
>> summary(decideTests_eset_b_tmm)
>> [,1]
>> -1 2855
>> 0 12346
>> 1 4928
>>
>> if I use:
>> design <- model.matrix(~ a + b + a:b , data=targets)
>>
>> then:
>> summary(decideTests_eset_b_tmm)
>> [,1]
>> -1 3343
>> 0 9490
>> 1 4191
>
> You haven't actually told us which coefficient you're testing for.
>
>> When having more than one factor, is it more appropriate to have the
>> interaction factor in the design?
>
> Do you expect the effect of experimental factor b to be same for each
> level of a? If yes, then maybe you don't need the interaction term. It
> depends on your experiment and on the questions you want to ask.
>
>> Thanks a lot
>> Best,
>>
>> Fernando
>
> BTW, I would much prefer it if you would post questions about edgeR to the
> main Bioconductor mailing list rather than to bioc-sig-sequencing. The
> questions relate more to the general problem of analysing gene expression
> experiments rather than to details of particular sequencing technologies.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Tel: (03) 9345 2326, Fax (03) 9347 0852,
> smyth at wehi.edu.au
> http://www.wehi.edu.au
> http://www.statsci.org/smyth
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list