[BioC] edgeR: finding tissue specific genes [was: Design matrix and BCV]
Gordon K Smyth
smyth at wehi.EDU.AU
Tue May 7 08:39:14 CEST 2013
Dear Manoj,
Why not simply find genes than are higher in one group than the average of
the other groups? edgeR can do this sort of thing easily.
Let's suppose suppose you going to using the quasi-lik approach of
glmQFTest() rather than glmQRT().
First define a design matrix for which the intercept is the overall mean:
contrasts(tiss_groups) <- contr.sum(tiss_groups)
design <- model.matrix(~tiss_groups)
Then estimate the trended dispersions:
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTrendedDisp(y, design)
Then fit the basic linear model:
fit <- glmFit(y, design)
Then you can extract all the lists you want. For example
ql <- glmQLFTest(fit, coef=2)
top1 <- topTags(ql)
will give you genes specifically up or specifically down in tissue 1, as
compared to the average of all the other groups.
ql <- glmQLFTest(fit, coef=3)
top2 <- topTags(ql)
will give you genes specifically up/down in tissue 2, and so on up to
ql <- glmQLFTest(fit, coef=18)
top17 <- topTags(de)
will give you genes specifically up/down in tissue 17. Finally, to get
genes up/down in tissue 18:
cont <- rep(-1,18)
cont[1] <- 0
ql <- glmQLFTest(fit, contrast=cont)
top18 <- topTags(de)
What you propose doesn't quite make sense to me. If you want to put genes
on the same scale (and you don't need to for the above analysis), would it
not be better to use rpkm()?
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Mon, 6 May 2013, Manoj Hariharan wrote:
> Thanks Gordon.
I was wondering if I could have a quantitative value for the deviance of
each group from the average, for each of the DE genes. I understand that
the F value (from the F-statistic) is a measure of how far the gene is
compared to the expression of all others across the samples.
One option, I could think of is to just get the normalized counts for each of the sample, for the set of DE genes:
de_lrt <- rownames(top_lrt[top_lrt$FDR<0.05,])
scale <- D$samples$lib.size*D$samples$norm.factors
normCounts <- round(t(t(D$counts)/scale)*mean(scale))
write.table(log(normCounts[de_lrt[1:5690],]+1), "All37_NormCounts_DEGenes", sep="\t", quote=FALSE)
Essentially, I am trying to get the list of genes that shows a more
"tissue-specific" behaviour. Most genes are not expressed strictly in one
particular tissue - there would be related tissues where its expression
would be almost similar. So I would like to rank them based on their
expression values and for that I need to have all comparable values.
Then, I could consider those samples where the expression of the gene is
higher than the 90th percentile. Hope that makes sense!
Thanks,
Manoj.
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}
More information about the Bioconductor
mailing list