[BioC] How to design matrix on edgeR to study genotype x environmental interaction
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Nov 11 00:15:40 CET 2012
Dear Daniela,
What version of the edgeR are you using? The posting guide asks you to
give sessionInfo() output so we can see package versions.
Your codes looks correct for testing an interaction, although you could
estimate the same interaction more directly using an interaction formula
as in Section 3.3.4 of the edgeR User's Guide.
However the model you have used is correct only if all 12 samples
correspond to the same physiological stage. I wonder why you are not
analysing all the 48 samples together. I would start with data
exploration of all 48 samples, including exploration measures like
transcript filtering, library sizes, normalization factors, an MDS plot, a
BCV plot, and so on. The first step is to check the data quality before
going on to test for differential expression.
edgeR has very high statistical power, even giving p-values smaller than I
would like in some cases. So if you're not getting any differential
expression, it is because there is none or because you have data quality
problems.
Best wishes
Gordon
> Date: Fri, 9 Nov 2012 14:44:28 +0100
> From: Daniela Lopes Paim Pinto <d.lopespaimpinto at sssup.it>
> To: bioconductor at r-project.org
> Subject: Re: [BioC] How to design matrix on edgeR to study genotype x
> environmental interaction
>
> Dear Gordon,
>
> Thank you so much for the reference. I read all the chapter regarding to
> the models and I tried to set up the following code considering a data
> frame like this:
>
>> target
> Sample Variety Location
> 1 1 CS Mont
> 2 2 CS Mont
> 3 25 CS Bol
> 4 26 CS Bol
> 5 49 CS Ric
> 6 50 CS Ric
> 7 13 SG Mont
> 8 14 SG Mont
> 9 37 SG Bol
> 10 38 SG Bol
> 11 61 SG Ric
> 12 62 SG Ric
>
>> group <- factor(paste(target$Variety,target$Location,sep="_"))
>> cbind(target,Group=group)
>> d <- DGEList(counts=file,group=group)
>> DGEnorm <- calcNormFactors(d)
>> design <- model.matrix(~0+group, data=DGEnorm$samples)
>> colnames(design) <- levels(group)
>
> Which gave me the design matrix:
>
>> design
> CS_Bol CS_Mont CS_Ric SG_Bol SG_Mont SG_Ric
> CS_Mont 0 1 0 0 0 0
> CS_Mont.1 0 1 0 0 0 0
> CS_Bol 1 0 0 0 0 0
> CS_Bol.1 1 0 0 0 0 0
> CS_Ric 0 0 1 0 0 0
> CS_Ric.1 0 0 1 0 0 0
> SG_Mont 0 0 0 0 1 0
> SG_Mont.1 0 0 0 0 1 0
> SG_Bol 0 0 0 1 0 0
> SG_Bol.1 0 0 0 1 0 0
> SG_Ric 0 0 0 0 0 1
> SG_Ric.1 0 0 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$group
> [1] "contr.treatment"
>
> And then I estimated the trended and tag wise dispersion and fit the model
> doing:
>
>> disp.tren <- estimateGLMTrendedDisp(DGEnorm,design)
>> disp.tag <- estimateGLMTagwiseDisp(disp.tren,design)
>> fit <- glmFit(disp.tag,design)
>
> When I made some contrasts to find DE miRNAs, for example:
>
>> my.constrasts <- makeContrasts(CS_BolvsMont = CS_Bol-CS_Mont,
> CSvsSG_BolvsMont = (CS_Bol-CS_Mont)-(SG_Bol-SG_Mont), levels=design)
>> lrt <- glmLRT(fit, contrast=my.constrasts[,"CS_BolvsMont"])
>
> I expected to find DE miRNAs due the environment effect (CS_BolvsMont) and
> for example DE miRNAs due the interaction genotypeXenvironment (
> CSvsSG_BolvsMont).
>
> However the results do not seems to reflect it, since I did not get even a
> single DE miRNA with significant FDR (even less than 20%!!!!) and going
> back to the counts in the raw data I find reasonable differences in their
> expression, which was expected. I forgot to mention that I decided to
> consider stage by stage separately and not add one more factor on the
> model, since I am not interested, for the moment, on the time course (as I
> wrote in the previous email - see below).
>
> Could you (or any body else from the list) give me some advise regarding
> the code? Is this matrix appropriate for the kind of comparisons I am
> interested on?
>
> Thank you in advance for any input.
>
> Daniela
>
>
>
>
> 2012/10/30 Gordon K Smyth <smyth at wehi.edu.au>
>
>> Dear Daniela,
>>
>> edgeR can work with any design matrix. Just setup your interaction
>> model using standard R model formula. See for example Chapter 11 of:
>>
>>
http://cran.r-project.org/doc/**manuals/R-intro.pdf<http://cran.r-project.org/doc/manuals/R-intro.pdf>
>>
>> Best wishes
>> Gordon
>>
>> Date: Mon, 29 Oct 2012 16:24:31 +0100
>>> From: Daniela Lopes Paim Pinto <d.lopespaimpinto at sssup.it>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] How to design matrix on edgeR to study genotype x
>>> environmental interaction
>>>
>>> Dear all,
>>>
>>> I'm currently working with data coming from deep sequencing of 48
>>> small RNAs libraries and using edgeR to identify DE miRNAs. I could
>>> not figure out how to design my matrix for the following experimental
>>> design:
>>>
>>> I have 2 varieties (genotypes), cultivated in 3 different locations
>>> (environments) and collected in 4 physiological stages. None of them
>>> represent a control treatment. I'm particulary interested on
>>> identifying those miRNAs which modulate their expression dependent on
>>> genotypes (G), environments (E) and G x E interaction. For instance
>>> the same variety in the 3 different locations, both varieties in the
>>> same location and both varieties in the 3 different locations.
>>>
>>> I was wondering if I could use the section 3.3 of edgeR user guide as
>>> reference or if someone could suggest me any other alternative method.
>>>
>>> Thanks in advance
>>>
>>> Daniela
>>>
>>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list