[BioC] Differential expresson in more than 2 samples using NGS?

Xiaohui Wu wux3 at muohio.edu
Tue Aug 24 22:27:50 CEST 2010


Hi Martin,

Thank you very much for your response. 
I'm reading the chipseq mannual now, it introduces peak detection process as you suggested like slice().
What I mean multiple samples is: for example, I have 8 libs for 4 tissues, each tissue has two replicates. And I want to know what DE genes are among these 4 tissues. If I need to compare two tissues each time to find DE gene between these two tissues, then for 4 tissues, I need to compare C(4,2)=6 times to get any DE genes between each two of the 4 tissues.  So I want to know whether there is any tool can compare many samples one time.

Xiaohui


-------------------------------------------------------------

On 08/24/2010 09:49 AM, Xiaohui Wu wrote:
> Hi all,
> 
> 
> I have about 30 libraries of SBS data (millions of 20nt tags) to
> analyze the differences between or among different libraries, and
> lots of these tags are in intergenic regions.
> 
> For gene regions, I think I can use DESeq or EdgeR to analyze the DE
> genes. But it seems that  DESeq or EdgeR can only deal with two
> samples, is there any package to compare multiple samples one time.
> For example, to find genes expressed highly in one or some libraries
> but not in other libs.
> 
> But for intergenic tags, I think first I should use some peak
> detection package to find peak in intergenic, then treat these peaks
> as genes to find DE regions.
> 
> Is there any peak detection package for NGS? and package for DE
> analysis among multiple libs?

If your starting point is BAM files of ungapped alignments and you're
looking for flexibility in peak calling, you might start with
Rsamtools::scanBam() to extract the position and width of each
alignment, manipulate that into a GRanges object, use
IRanges::coverage() and IRanges::slice() and friends to identify and
summarize peaks.

It's unclear whether you mean more than two samples (handled by edgeR
and DESeq, I think) or more than one factor with two levels; in the
latter an approach is to use the normalization and transformation
methods offered by either of the packages (e.g.,
getVarianceStabilizedData from DESeq, I think), and to analyze these
with standard R methods on the hopes that the data is normal and
homoscedastic enough.

Hopefully others will answer with better advice.

Martin

> 
> Thank you!
> 
> Regards, Xiaohui
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793
.



More information about the Bioconductor mailing list