[BioC] Using summarizeOverlaps with multiple samples/readgroups in a single bam file?

Martin Morgan mtmorgan at fhcrc.org
Sat Jan 12 21:53:36 CET 2013


On 1/12/2013 12:29 PM, Ryan C. Thompson wrote:
> Hi all,
>
> I'm looking at simplifying my differential expression pipeline a little bit by
> merging all my input bam files into one bam file with multiple samples/read
> groups and then using that bam file as input to summarizeOverlaps. Is this
> supported in any way? I've never worked with sam read groups before (I always
> just did one sample per file), so I don't really know anything about them.
>
> So is it supported to take a single bam file and use summarizeOverlaps or some
> other mechanism to get a SummarizedExperiment object with one column for each
> sample in the bam file, rather than one column per file?

Rsamtools doesn't do anything special with read groups (e.g., no pre-filtering) 
and summarizeOverlaps doesn't do per-read-group counting (one can provide one's 
own counting function to summarizedOverlaps, though...) Also, parallelizing over 
bam files is a simple way to get better throughput (providing a BamFileList as 
the second argument to summarizeOverlaps, and with 'parallel' on the search 
path, currently uses mclapply and memory-efficient iteration to populate the 
SummarizedExperiment), so in some ways one large bam file is a step in a 
counter-productive direction.

Martin

>
> -Ryan Thompson
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109



More information about the Bioconductor mailing list