[BioC] edgeR: design matrix for different condition
KJ Lim
jinkeanlim at gmail.com
Mon Aug 6 06:16:26 CEST 2012
Dear Prof Gordon,
Good day.
Thanks for your time to update the edgeR User's Guide. It is useful.
I combined all my experiment factors into one combined factor and
described the experiment matrix like the example in the User's guide:
> Group <- factor(paste(targets$treat, targets$tree,sep="."))
> cbind(targets,Group=Group)
> hl.design <- model.matrix(~0+Group)
But, I encountered an error when I performed the estimate dispersion
for my data
> hl <- estimateGLMCommonDisp(hl, hl.design)
Warning message:
In estimateGLMCommonDisp.default(y = y$counts, design = design, :
No residual df: setting dispersion to NA
I tried to figure out what I have done wrong, unfortunately, I have no
luck on that. Could you or the community kindly please light me for
this matter?
Thank you very much for your time.
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i486-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_2.6.10 limma_3.12.1
loaded via a namespace (and not attached):
[1] tcltk_2.15.1 tools_2.15.1
Best regards,
KJ Lim
On 31 July 2012 02:24, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear KH Kim,
>
> No, you have not understood me correctly. I did not suggest that you change
> from edgeR to limma. I suggested that you read the limma documentation
> because the design matrix is the same for edgeR as it is for limma, so the
> limma documentation would help you create the design matrix for your edgeR
> analysis.
>
> I thought from your last email that you had already done this and that you
> had completed the edgeR analysis satisfactorily.
>
> I wrote more documentation for the edgeR User's Guide a couple of days ago,
> trying to give advice for experiments such as yours. You might find that
> Section 3.3 of
>
> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>
> gives more explanation.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Tel: (03) 9345 2326, Fax (03) 9347 0852,
> http://www.statsci.org/smyth
>
>
> On Mon, 30 Jul 2012, KJ Lim wrote:
>
>> Dear Prof Gordon,
>>
>> Good day.
>>
>> I have read the section 8.5 of the Limma manual as you suggested in
>> previous emails. Thanks.
>>
>> If I understand you correctly, you would suggest me to carry out my DE
>> analysis with limma package if I would like to learn the which genes
>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H.
>>
>> May I ask how can I generate the EList, "eset", in order to fit in the
>> linear model as mentioned in the limma manual
>>
>> fit <- lmFit(eset, design)
>> fit <- eBayes(fit)
>>
>> Please correct me if I'm wrong.
>>
>> Thank you very much for your time and help.
>>
>> Best regards,
>> KJ Lim
>>
>>
>>
>> On 27 July 2012 02:31, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>
>>> Dear KJ Lim,
>>>
>>> Thanks for the rephrasing, which is clearer. I would have liked you to
>>> mention however whether you read the documentation that I refered you do.
>>>
>>>
>>> On Thu, 26 Jul 2012, KJ Lim wrote:
>>>
>>>> Dear Prof Gordon,
>>>>
>>>> Good day. Thanks for your prompt replied.
>>>>
>>>> Please allow me to rephrase my previous question.
>>>>
>>>> This model, ~tree+treat, is assumed effect of the time of treatment
>>>> will be the same irrespective of genotype.
>>>
>>>
>>>
>>> Yes, this model makes that assumption.
>>>
>>>
>>>> Thus, test for the coef=3 will gives me the differential expression at
>>>> 96H
>>>> irrespective of genotype.
>>>
>>>
>>>
>>> Actually coef=3 refers to 24H, according to the design matrix in your
>>> original email. A test for coef=3 would test for DE at 24H vs 0H,
>>> irrespective of genotype. But only if the assumption mentioned above is
>>> true, which is unlikely given the rest of your email.
>>>
>>>
>>>> I would like to learn the differential expression of treeHS vs treeLS
>>>> at specific time points i.e. 24H or if possible across the whole time
>>>> points. Should I fit my model as
>>>>
>>>> model A: ~tree*treat OR model B: ~tree+tree:treat ?
>>>
>>>
>>>
>>> These models are equivalent, so it is just a matter of convenience which
>>> one
>>> you use, as I tried to explain in the limma documentation I refered you
>>> to.
>>>
>>>
>>>> The design matrix columns of model B give:
>>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H"
>>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H"
>>>> "treeLS:treat96H"
>>>>
>>>> Am I doing right if I fit the model B and test for the coef=3:4 to
>>>> learn the differential expression of treeHS vs treeLS at specific time
>>>> points i.e. 03H? Does this test could tells me which set of genes
>>>> up/down-regulated in treeLS or treeHS?
>>>
>>>
>>>
>>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS.
>>> So
>>> testing for both coefficients coef=3:4 tests for a 3H effect in either
>>> treeLS or treeHS.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>
>>>> I'm not good in statistic and I'm learning, kindly please correct me
>>>> if I'm wrong.
>>>>
>>>> Thank you very much for your time and suggestion.
>>>>
>>>> Best regards,
>>>> KJ Lim
>>>>
>>>>
>>>>
>>>>
>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Dear KJ Lim,
>>>>>
>>>>> I don't quite understand your question, because you seem to be asking
>>>>> for
>>>>> something that isn't a test for differential expression, which is what
>>>>> edgeR
>>>>> does. You question "I would like to learn the genes are express in
>>>>> treeHS
>>>>> but not treeLS and vice versa" seems to be about absolute expression
>>>>> levels.
>>>>> edgeR doesn't test for genes that are not expressed in particular
>>>>> conditions.
>>>>>
>>>>> I'll refer you to the limma section on interaction models in case that
>>>>> helps, see Section 8.5 of:
>>>>>
>>>>>
>>>>>
>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/usersguide.pdf
>>>>>
>>>>> Setting up a design matrix is the same for edgeR as it is for limma.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300
>>>>>> From: KJ Lim <jinkeanlim at gmail.com>
>>>>>> To: Bioconductor mailing list <bioconductor at r-project.org>
>>>>>> Subject: [BioC] edgeR: design matrix for different condition
>>>>>>
>>>>>> Dear the edgeR community,
>>>>>>
>>>>>> Good day.
>>>>>>
>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2
>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H,
>>>>>> 24H, 96H).
>>>>>>
>>>>>> I first assumed that the treatment effect will be the same for each
>>>>>> genotype and I have the design matrix as:
>>>>>>
>>>>>> design <- model.matrix(~tree+treat)
>>>>>>
>>>>>>> design
>>>>>>
>>>>>>
>>>>>>
>>>>>> (Intercept) treat03H treat24H treat96H treeLS
>>>>>> 1 1 0 0 0 0
>>>>>> 2 1 0 0 0 0
>>>>>> 3 1 1 0 0 0
>>>>>> 4 1 1 0 0 0
>>>>>> 5 1 0 1 0 0
>>>>>> 6 1 0 1 0 0
>>>>>> 7 1 0 0 1 0
>>>>>> 8 1 0 0 1 0
>>>>>> 9 1 0 0 0 1
>>>>>> 10 1 0 0 0 1
>>>>>> 11 1 1 0 0 1
>>>>>> 12 1 1 0 0 1
>>>>>> 13 1 0 1 0 1
>>>>>> 14 1 0 1 0 1
>>>>>> 15 1 0 0 1 1
>>>>>> 16 1 0 0 1 1
>>>>>>
>>>>>> I used coef=4 to test for the differential expressions between treeHS
>>>>>> and treeLS within the time of treatment, coef=3 to learn the
>>>>>> differential expressions in 2 genotypes at time of treatment 96H.
>>>>>>
>>>>>> ** I would like to learn the genes are express in treeHS but not
>>>>>> treeLS and vice versa at certain time of treatment let's say 24H or
>>>>>> across the whole time of treatment, should I have the design matrix as
>>>>>> below or more steps I need to do?
>>>>>>
>>>>>> design <- model.matrix(~tree*treat)
>>>>>>
>>>>>> Kindly please light me on this. I appreciate very much for your help
>>>>>> and
>>>>>> time.
>>>>>>
>>>>>> Have a nice day.
>>>>>>
>>>>>> Best regards,
>>>>>> KJ Lim
>>>
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}
More information about the Bioconductor
mailing list