[BioC] edgeR : input and design matrix help
Mark Robinson
mark.robinson at imls.uzh.ch
Fri May 18 10:32:03 CEST 2012
Hi KJ Lim,
> trees <-
> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS","LS","LS","LS","LS"))
This one seems ok, although could be written more compactly and I'll give it a different name:
geno <- factor( rep(c("HS","LS"),each=8))
> treat <-
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))
Maybe it's better to call this 'time' and you'll need to change this to something like:
time <- factor( rep(c("C","3h","24h","48h"), each=4) )
These two factor vectors match the columns of your count table, so you should be ok there.
If I understand your description, you are interested primarily in the differences in genotypes. I would suggest starting with:
design <- model.matrix(~time+geno)
> design
(Intercept) time3h time48h timeC genoLS
1 1 0 0 1 0
2 1 0 0 1 0
3 1 0 0 1 0
4 1 0 0 1 0
5 1 1 0 0 0
6 1 1 0 0 0
7 1 1 0 0 0
8 1 1 0 0 0
9 1 0 0 0 1
10 1 0 0 0 1
11 1 0 0 0 1
12 1 0 0 0 1
13 1 0 1 0 1
14 1 0 1 0 1
15 1 0 1 0 1
16 1 0 1 0 1
attr(,"assign")
[1] 0 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$time
[1] "contr.treatment"
attr(,"contrasts")$geno
[1] "contr.treatment"
Then, you can follow the usual steps as per the edgeR user's guide -- estimate dispersion, fit the glm with glmFit() and do the LR testing with glmLRT() and so on.
Hope that gets you started.
Best,
Mark
On 18.05.2012, at 05:15, KJ Lim wrote:
> Dear edgeR users,
>
> I am not a statistician nor R programming geek, please forgive me if I ask
> stupid question.
>
> I have RNA-seq data for 2 different genotype of trees with different time
> points 0hour(control),3hr,24hours,and 48hours. Each time point has two
> replicates. The experiment design like following:
>
> Sample harvested after treatment at
> Tree H1 Ctrl 3hrs 24hrs 48hrs
> Tree H2 Ctrl 3hrs 24hrs 48hrs
>
> Tree L1 Ctrl 3hrs 24hrs 48hrs
> Tree L2 Ctrl 3hrs 24hrs 48hrs
>
> I would like to study genes that are differentially expressed (DE)
> throughout the time points in these 2 genotype of trees with edgeR. I read
> from the edgeR user guide, the suitable DE analysis method for my expriment
> is GLM likelihood ratio test.
>
> After read case study in the user guide, I have the RNA-Seq counts in a
> file as below in order to input into the edgeR package.
>
> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H
> L2_3H L1_1D L2_1D L1_2D L2_2D
> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1
> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0
> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0
>
> I used the following commands to input the counts file into edgeR:
>
> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE,
> stringsAsFactor=FALSE)
> trees <-
> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS","LS","LS","LS","LS"))
> treat <-
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))
>
> I'm not good in R programming, thus, having this file input into the edgeR
> and assign the factors as well as design-matrix is a challenge. I'm stuck
> how to tell the edgeR about my design matrix.
>
> Could you guys kindly help me on this? Have I input my RNA-Seq counts
> correctly? Please correct me if there is any mistake I have done in my
> edgeR input.
>
> I appreciate very much for your help and time. Have a nice day.
>
> Best regards,
> KJ Lim
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland
v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin
More information about the Bioconductor
mailing list