[BioC] Questions on doing EdgeR Analysis of Timeseries Data

Gordon K Smyth smyth at wehi.EDU.AU
Sat Jun 11 03:14:29 CEST 2011


Hi Mark,

This is an analysis question rather than a devel suggestion, so I've 
transferred it from Bioc-sig-seq to Bioconductor.  Hope you don't mind.

If you have divided your samples into two groups, and you have 7 samples, 
then I recommend you set prior.n=4.  See

https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2011-June/002042.html

for an explanation.

I can't answer your other questions without knowing whether you have 
biological replicates in your experiment.  You have 7 samples.  Do these 
samples all correspond to distinct lifecycle stages, or are some stages 
observed more than once?  How many stages are there?

Best wishes
Gordon

> Date: Thu, 9 Jun 2011 18:19:42 +0000
> From: "Lawson, Mark (mjl3p)" <mjl3p at virginia.edu>
> To: "bioc-sig-sequencing at r-project.org"
> 	<bioc-sig-sequencing at r-project.org>
> Subject: [Bioc-sig-seq] Questions on doing EdgeR Analysis of
> 	Timeseries Data
>
> (I apologize if this message was sent more than once)
>
> Dear Analysis Gurus
>
> I am currently performing a gene expression analysis on a plant 
> parasite. I have mapped Illumina read counts for various stages in this 
> parasites lifecycle. Of interest for us in this analysis are genes that 
> are differentially expressed during these lifecycles. To determine this, 
> I have focused on two types of differential expression: "peaks" and 
> "cliffs." "Peaks" occur when a gene is differentially expressed in one 
> time sample (either higher or lower than the remaining samples) and 
> "cliffs" occur when a gene is differentially expressed between two 
> groups of sample (for instance higher expression in the first three 
> samples than the last three).
>
> To determine these peaks and cliffs, I have been creating groups in 
> which the desired peak/cliff is "case" and the remaining samples are 
> "control." I then run common dispersion and/or tagwise dispersion and 
> extract those reads with an FDR of less than 0.1. So, my questions:
>
> 1.) How much filtering of data should I do? Right now I have a fair 
> amount of genes that are expressed in 0, 1, 2 etc. samples. It seems 
> logical that I would filter out genes that have no expression, but at 
> what level should it stop? Also, should there be different filtering 
> depending on the analysis (peak or cliff)?
>
> 2.) When doing tagwise dispersion, what should I set my prior.n to (I 
> currently have 7 samples)? Does it depend on the type of analysis?
>
> 3.) Should I investigate using a more advanced glm based analysis? Any 
> advice on crafting a design for this?
>
> 4.) Any other ideas on analyses to perform on a set of timeseries data 
> with EdgeR?
>
> I greatly appreciate any help/advice and thank you in advance!
>
>
> Mark J. Lawson, Ph.D.
> Bioinformatics Research Scientist
> Center for Public Health Genomics, UVA
> mlawson at virginia.edu

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list