[BioC] edgeR : input and design matrix help
Mark Robinson
mark.robinson at imls.uzh.ch
Fri May 18 22:31:35 CEST 2012
On 18.05.2012, at 12:11, KJ Lim wrote:
> Dear Prof Mark Robinson,
>
> Good day.
>
>> treat <-
>
>>
>> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))
>>
>> Maybe it's better to call this 'time' and you'll need to change this to
>> something like:
>> time <- factor( rep(c("C","3h","24h","48h"), each=4) )
>>
>
> I assigned the time factor as suggested and it showed me as:
>
>> time
> [1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h
> Levels: 24h 3h 96h C
>
> However, may I ask, does this time factor vector matches the read counts of
> time in the input file?
> The read counts of time in the input file is: C C 3h 3h 24h 24h 96h 96h
> C C 3h 3h 24h 24h 96h 96h
>
> Should I make some adjustment in my input file in order to suit the time
> factor vector? Please forgive me, if this is a stupid question.
Apologies, my mistake … change that to:
time <- factor( rep(rep(c("C","3h","24h","48h"),each=2),2) )
Best,
Mark
> If I understand your description, you are interested primarily in the
>> differences in genotypes. I would suggest starting with:
>>
>> design <- model.matrix(~time+geno)
>>
>
> I would like to study the differentially expressed genes in both 2 HS and
> LS genotypes of tress across time course experiment.
>
> If I would like to study the differentially expressed genes in HS genotype
> tress across time course, can I have the design matrix as below?
>
> design <- model.matrix(~time, data=rawdata)
>
> Thank you very much for your time and help.
>
> Best regards,
> KJ Lim
>
>
>
> On 18 May 2012 11:32, Mark Robinson <mark.robinson at imls.uzh.ch> wrote:
>
>> Hi KJ Lim,
>>
>>> trees <-
>>>
>> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS","LS","LS","LS","LS"))
>>
>> This one seems ok, although could be written more compactly and I'll give
>> it a different name:
>>
>> geno <- factor( rep(c("HS","LS"),each=8))
>>
>>> treat <-
>>>
>> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))
>>
>> Maybe it's better to call this 'time' and you'll need to change this to
>> something like:
>>
>> time <- factor( rep(c("C","3h","24h","48h"), each=4) )
>>
>> These two factor vectors match the columns of your count table, so you
>> should be ok there.
>>
>> If I understand your description, you are interested primarily in the
>> differences in genotypes. I would suggest starting with:
>>
>> design <- model.matrix(~time+geno)
>>
>>> design
>> (Intercept) time3h time48h timeC genoLS
>> 1 1 0 0 1 0
>> 2 1 0 0 1 0
>> 3 1 0 0 1 0
>> 4 1 0 0 1 0
>> 5 1 1 0 0 0
>> 6 1 1 0 0 0
>> 7 1 1 0 0 0
>> 8 1 1 0 0 0
>> 9 1 0 0 0 1
>> 10 1 0 0 0 1
>> 11 1 0 0 0 1
>> 12 1 0 0 0 1
>> 13 1 0 1 0 1
>> 14 1 0 1 0 1
>> 15 1 0 1 0 1
>> 16 1 0 1 0 1
>> attr(,"assign")
>> [1] 0 1 1 1 2
>> attr(,"contrasts")
>> attr(,"contrasts")$time
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$geno
>> [1] "contr.treatment"
>>
>> Then, you can follow the usual steps as per the edgeR user's guide --
>> estimate dispersion, fit the glm with glmFit() and do the LR testing with
>> glmLRT() and so on.
>>
>> Hope that gets you started.
>>
>> Best,
>> Mark
>>
>>
>>
>>
>> On 18.05.2012, at 05:15, KJ Lim wrote:
>>
>>> Dear edgeR users,
>>>
>>> I am not a statistician nor R programming geek, please forgive me if I
>> ask
>>> stupid question.
>>>
>>> I have RNA-seq data for 2 different genotype of trees with different time
>>> points 0hour(control),3hr,24hours,and 48hours. Each time point has two
>>> replicates. The experiment design like following:
>>>
>>> Sample harvested after treatment at
>>> Tree H1 Ctrl 3hrs 24hrs 48hrs
>>> Tree H2 Ctrl 3hrs 24hrs 48hrs
>>>
>>> Tree L1 Ctrl 3hrs 24hrs 48hrs
>>> Tree L2 Ctrl 3hrs 24hrs 48hrs
>>>
>>> I would like to study genes that are differentially expressed (DE)
>>> throughout the time points in these 2 genotype of trees with edgeR. I
>> read
>>> from the edgeR user guide, the suitable DE analysis method for my
>> expriment
>>> is GLM likelihood ratio test.
>>>
>>> After read case study in the user guide, I have the RNA-Seq counts in a
>>> file as below in order to input into the edgeR package.
>>>
>>> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H
>>> L2_3H L1_1D L2_1D L1_2D L2_2D
>>> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
>>> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1
>>> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0
>>> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0
>>>
>>> I used the following commands to input the counts file into edgeR:
>>>
>>> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE,
>>> stringsAsFactor=FALSE)
>>> trees <-
>>>
>> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS","LS","LS","LS","LS"))
>>> treat <-
>>>
>> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))
>>>
>>> I'm not good in R programming, thus, having this file input into the
>> edgeR
>>> and assign the factors as well as design-matrix is a challenge. I'm stuck
>>> how to tell the edgeR about my design matrix.
>>>
>>> Could you guys kindly help me on this? Have I input my RNA-Seq counts
>>> correctly? Please correct me if there is any mistake I have done in my
>>> edgeR input.
>>>
>>> I appreciate very much for your help and time. Have a nice day.
>>>
>>> Best regards,
>>> KJ Lim
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> ----------
>> Prof. Dr. Mark Robinson
>> Bioinformatics
>> Institute of Molecular Life Sciences
>> University of Zurich
>> Winterthurerstrasse 190
>> 8057 Zurich
>> Switzerland
>>
>> v: +41 44 635 4848
>> f: +41 44 635 6898
>> e: mark.robinson at imls.uzh.ch
>> o: Y11-J-16
>> w: http://tiny.cc/mrobin
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list