[BioC] edgeR glm
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Nov 20 04:27:13 CET 2012
Dear Upendra,
Statistical theory warns strongly against testing for main effects in a
linear model when interactions are present, because it is too difficult to
interpret exactly what you are testing. This what you are doing with:
glmLRT(data.int,fit.int,coef=2:3)
Unless you are very familiar and confident with interaction model formula,
you may be better off following the advice in Chapter 3 of the edgeR
User's Guide:
http://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
Best wishes
Gordon
PS. Are you working with the current release of edgeR and Bioconductor?
------------ original message -----------
[BioC] edgeR glm
upendra kumar devisetty udevisetty at ucdavis.edu
Sun Nov 18 16:36:02 CET 2012
Hi Gordon,
For my experiment, I have three levels of factors for genotype ("B2,
"Ein40","Ein9") and two levels of factors as treatment ("Sun", "Shade"). I
did a nested interaction method to detect the differential gene
expression.
Basically i am interested in the following:
1. Those genes that are differentially expressed among all three
genotypes.
2. Those genes that are differentially expressed between wild-type (B2)
and two mutants(Ein40 & Ein9) for Shade treatment.
Here is my code for the above:
library(edgeR)
library(reshape)
counts <- read.delim("sam2countsResults.tsv",row.names=NULL)
head(counts)
counts<-counts[counts$gene!="*",]
row.names(counts) <- counts$gene
counts <- counts[,-1]
head(counts)
counts[is.na(counts)] <- 0
names(counts)
summary(counts)
sort(colSums(counts))
counts <- counts[,colSums(counts) > 100000]
names(counts)
dim(counts)
samples <- data.frame(
file=names(counts),
gt=unlist(strsplit(names(counts),"[_\\.]"))[c(1,7,13,19,25,31,37,43,49,55,61,67,73,79,85,91,97,103)],
trt=unlist(strsplit(names(counts),"[_\\.]"))[c(3,9,15,21,27,33,39,45,51,57,63,69,75,81,87,93,99,105)],
rep=unlist(strsplit(names(counts),"[_\\.]"))[c(4,10,16,22,28,34,40,46,52,58,64,70,76,82,88,94,100,106)]
)
data.int <- DGEList(counts,group=group)
data.int$samples
data.int <- data.int[rowSums(cpm(data.int) > 1) >= 3,]
dim(data.int)
data.int <- calcNormFactors(data.int)
design.int <- model.matrix(~gt*trt)
rownames(design.int) <- colnames(data.int)
design.int
data.int <- estimateGLMCommonDisp(data.int,design.int)
data.int <- estimateGLMTrendedDisp(data.int,design.int)
fit.int <- glmFit(data.int,design.int)
colnames(fit.int$design)
> colnames(fit.int$design)
[1] "(Intercept)" "gtein40" "gtein9" "trtSun"
"gtein40:trtSun"
[6] "gtein9:trtSun"
For my objective 1 i am doing the following:
int.lrt <- glmLRT(data.int,fit.int,coef=2:3)
topTags(int.lrt)
and for my objective 2 i am doing the following:
int.lrt <- glmLRT(data.int,fit.int,coef=5:6)
topTags(int.lrt)
Am i doing correct?
Thanks in advance for your help.
Upendra
-------------------------------------------------
Upendra Kumar Devisetty, Ph.D
Postdoctoral Scholar
Dept. of Plant Biology, UC Davis
Davis, CA 95616
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list