[BioC] edgeR - GLM for multifactors
James W. MacDonald
jmacdon at uw.edu
Tue Jun 11 16:06:19 CEST 2013
Hi Alan,
On 6/11/2013 12:14 AM, Alan Smith wrote:
> Hello List,
>
> I'm trying to perform DE analysis using edgeR on the following samples with
> 2 replicates per condition.
>
> WC - Wildtype or Normal control
> WI - Wildtype or Normal upon Infection
> LC - Dose1 control
> LI - Dose1 Infection
> HC - Dose2 control
> HI - Dose2 Infection
>
> MY design matrix:
> HC HI LC LI WC WI
> WC1 0 0 0 0 1 0
> WC2 0 0 0 0 1 0
> WI1 0 0 0 0 0 1
> WI2 0 0 0 0 0 1
> LC1 0 0 1 0 0 0
> LC2 0 0 1 0 0 0
> LI1 0 0 0 1 0 0
> LI2 0 0 0 1 0 0
> HC1 1 0 0 0 0 0
> HC2 1 0 0 0 0 0
> HI1 0 1 0 0 0 0
> HI2 0 1 0 0 0 0
>
> Cond.contrasts<- makeContrasts(DoseEffect1=(LI-LC)-(WI-WC),
> DoseEffect2=(HI-HC)-(WI-WC), OverallDoseEffect=(HI+HC+LI+LC)-(WI+WC),
> InfectionEffect=(WI+HI+LI)-(WC+HC+LC), levels=design)
>
> With the code below I believe I managed to get DE genes in one to one
> comparison but I don't think I got through with the overall dose effect and
> infection effect (contrast made as above).
> Please go over the condition level contrasts and correct me if I'm wrong.
You are wrong ;-D
For a contrast to be valid, the coefficients are supposed to add up to
zero. So as an example, let's look at your first contrast:
(LI-LC)-(WI-WC)
this can be written as
LI - LC - WI + WC
or more explicitly as
1*LI - 1*LC - 1*WI + 1*WC
and the coefficients are then
1-1-1+1
which adds up to zero. However in your third contrast, the coefficients
don't add up to zero. In these instances (where you are trying to
compare the mean of one group to the mean of another), it is easiest to
just divide by the number of groups. So you would want
OverallDoseEffect=(HI+HC+LI+LC)/4-(WI+WC)/2
I think the fourth contrast is OK, as the coefficients add up to zero,
but you can always divide each group by three, which will at least (in
my mind) explicitly indicate that you are looking at the comparison of
the means for each group.
Best,
Jim
>
> Hope this information is not too little/confusing to understand the problem.
>
> Appreciate your time and help.
>
> Thank you,
> Alan
>
>
> library(edgeR)
> xim<- read.delim("NreadCounts.txt", row.names=1, stringsAsFactors=FALSE)
> names(x)
> group = factor(c("WC", "WC", "WI", "WI", "LC", "LC", "LI", "LI", "HC",
> "HC", "HI", "HI"))
> y<- DGEList(counts=x, group=group)
> keep<- rowSums (cpm(y)>1)>= 3
> y2<- y[keep,]
> dim(y2)
> [1] 31865 12
> y2$samples$lib.size<- colSums(y2$counts)
> y3<- calcNormFactors (y2)
> levels (y3$samples$group)
> [1] "HC" "HI" "LC" "LI" "WC" "WI"
> design<- model.matrix(~0+group, data=y3$samples)
> colnames(design)<- levels(group)
> y3<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
> y3<- estimateGLMTrendedDisp(y3, design)
> y3<- estimateGLMTagwiseDisp(y3, design)
> fit<- glmFit(y3, design)
> OneONone.contrasts<- makeContrasts(HIvsHC=HI-HC, LIvsLC=LI-LC,
> WIvsWC=WI-WC, HCvsWC=HC-WC, LCvsWC=LC-WC, HIvsWI=HI-WI, LIvsWI=LI-WI,
> levels=design)
>> OneONone.contrasts
> Contrasts
> Levels HIvsHC LIvsLC WIvsWC HCvsWC LCvsWC HIvsWI LIvsWI
> HC -1 0 0 1 0 0 0
> HI 1 0 0 0 0 1 0
> LC 0 -1 0 0 1 0 0
> LI 0 1 0 0 0 0 1
> WC 0 0 -1 -1 -1 0 0
> WI 0 0 1 0 0 -1 -1
>
> Cond.contrasts<- makeContrasts(DoseEffect1=(LI-LC)-(WI-WC),
> DoseEffect2=(HI-HC)-(WI-WC), OverallDoseEffect=(HI+HC+LI+LC)-(WI+WC),
> InfectionEffect=(WI+HI+LI)-(WC+HC+LC), levels=design)
> Cond.contrasts
> Contrasts
> Levels DoseEffect1 DoseEffect2 OverallDoseEffect InfectionEffect
> HC 0 -1 1 -1
> HI 0 1 1 1
> LC -1 0 1 -1
> LI 1 0 1 1
> WC 1 1 -1 -1
> WI -1 -1 -1 1
>> lrt.OverallDoseEffect<- glmLRT(fit,
> contrast=Cond.contrasts[,"OverallDoseEffect"])
>> summary(de<-decideTestsDGE(lrt.OverallDoseEffect))
> [,1]
> -1 31865
> 0 0
> 1 0
>> lrt.InfectionEffect<- glmLRT(fit,
> contrast=Cond.contrasts[,"InfectionEffect"])
>> summary(de<-decideTestsDGE(lrt.InfectionEffect))
> [,1]
> -1 270
> 0 31108
> 1 487
>
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] edgeR_3.2.3 limma_3.16.5
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list