[Bioc-sig-seq] Questions on doing EdgeR Analysis of Timeseries Data

Wed Jun 8 16:15:53 CEST 2011

Dear Analysis Gurus

I am currently performing a gene expression analysis on a plant parasite. I have mapped Illumina read counts for various stages in this parasites lifecycle. Of interest for us in this analysis are genes that are differentially expressed during these lifecycles. To determine this, I have focused on two types of differential expression: "peaks" and "cliffs." "Peaks" occur when a gene is differentially expressed in one time sample (either higher or lower than the remaining samples) and "cliffs" occur when a gene is differentially expressed between two groups of sample (for instance higher expression in the first three samples than the last three). 

To determine these peaks and cliffs, I have been creating groups in which the desired peak/cliff is "case" and the remaining samples are "control." I then run common dispersion and/or tagwise dispersion and extract those reads with an FDR of less than 0.1. So, my questions:

1.) How much filtering of data should I do? Right now I have a fair amount of genes that are expressed in 0, 1, 2 etc. samples. It seems logical that I would filter out genes that have no expression, but at what level should it stop? Also, should there be different filtering depending on the analysis (peak or cliff)?

2.) When doing tagwise dispersion, what should I set my prior.n to (I currently have 7 samples)? Does it depend on the type of analysis?

3.) Should I investigate using a more advanced glm based analysis? Any advice on crafting a design for this?

4.) Any other ideas on analyses to perform on a set of timeseries data with EdgeR?

I greatly appreciate any help/advice and thank you in advance!

Mark J. Lawson, Ph.D.
Bioinformatics Research Scientist
Center for Public Health Genomics, UVA
mlawson at virginia.edu