[BioC] EdgeR multi-factor design questions
James W. MacDonald
jmacdon at uw.edu
Mon Mar 25 21:01:24 CET 2013
Hi Morgan,
On 3/25/2013 3:15 PM, Morgan Mouchka wrote:
> Hello all,
>
> I’m currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual.
>
>
> More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). I’m interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond?
>
>
> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question.
>
> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but I’m uncertain as to the best way to set this up.
This is definitely an interaction term. How you set it up is dependent
on what parameterization makes the most sense to you. Following your
method below, you want
glmLRT(fit, contrast = c(1,-1,-1,1))
or you could do
state <- factor(rep(1:2, each = 8))
treat <- factor(rep(1:2, each=4, times=2))
design <- model.matrix(~treat*state)
and then
glmLRT(fit, coef=4)
will test the interaction, and coefficients 2 and 3 test the main
effects for treatment and state, as you have done below.
Best,
Jim
>
> I’m using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment.
>
> Thanks so much,
>
> Morgan
>
> Morgan Mouchka
> PhD Candidate
> E231 Corson Hall
> Dept. Ecology and Evolutionary Biology
> Cornell University
> mep74 at cornell.edu
> http://www.eeb.cornell.edu/mouchka
>
>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig")
>> head (x)
> AC3 AC4 AC5 AC7 AT1 AT2
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470
> AT4 AT5 SC1 SC2 SC3 SC5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194
> ST1 ST2 ST3 ST5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221
>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST
>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"))
>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size
>> y<- DGEList (counts = x, group = group)
> Calculating library sizes from column totals.
>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case)
>> keep<- rowSums (cpm(y)>1)>= 4
>> y2<- y[keep,]
>> dim(y2)
> [1] 17739 16
>> y2$samples$lib.size<- colSums(y2$counts)
>> # Normalize for sample-specific effects
>> y3<- calcNormFactors (y2)
>> y3$samples
> group lib.size norm.factors
> AC3 A 11095093 1.0237872
> AC4 A 5604841 0.9912734
> AC5 A 7517691 0.9824405
> AC7 A 5918039 0.9919439
> AT1 B 6004830 0.9609672
> AT2 B 7692419 0.9543765
> AT4 B 5197815 0.9575609
> AT5 B 7759281 0.9643070
> SC1 C 3542875 1.0693587
> SC2 C 6720709 1.0488297
> SC3 C 3881774 1.0022528
> SC5 C 5109584 1.0429925
> ST1 D 10429666 0.9815467
> ST2 D 9662842 1.0103866
> ST3 D 8267838 1.0198095
> ST5 D 10698815 1.0069064
>> #plot biological coefficient of variation (BVC)between samples
>> plotMDS(y3)
>> #check levels
>> levels (y3$samples$group)
> [1] "A" "B" "C" "D"
>> #design matrix to describe treatment conditions
>> design<- model.matrix(~0+group, data=y3$samples)
>> colnames(design)<- c("A", "B", "C", "D")
>> design
> A B C D
> AC3 1 0 0 0
> AC4 1 0 0 0
> AC5 1 0 0 0
> AC7 1 0 0 0
> AT1 0 1 0 0
> AT2 0 1 0 0
> AT4 0 1 0 0
> AT5 0 1 0 0
> SC1 0 0 1 0
> SC2 0 0 1 0
> SC3 0 0 1 0
> SC5 0 0 1 0
> ST1 0 0 0 1
> ST2 0 0 0 1
> ST3 0 0 0 1
> ST5 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$group
> [1] "contr.treatment"
>
>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
> Disp = 0.02061 , BCV = 0.1436
>> #estimate gene-wise dispersion
>> y5<- estimateGLMTrendedDisp(y4, design)
>> y6<- estimateGLMTagwiseDisp(y5,design)
>> plotBCV(y6)
>> fit<- glmFit(y6, design)
>> # make contrasts
>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design)
>> #pairwise comparison of B and A (apo treatmetn and apo control)
>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"])
>> # summary of model fitting funtions
>> summary(de<-decideTestsDGE(lrt.BvsA))
> [,1]
> -1 562
> 0 16091
> 1 1086
>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"])
>> summary(de<-decideTestsDGE(lrt.DvsC))
> [,1]
> -1 496
> 0 16710
> 1 533
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list