[Bioc-devel] GenomicAlignments: using asMates=TRUE and yieldSize with paired-end BAM files

Valerie Obenchain vobencha at fhcrc.org
Wed Mar 19 18:04:59 CET 2014


Hi Mike,

You no longer need to sort Bam files to use the pairing algo or 
yieldSize. The readGAlignment* functions now work with both constraints 
out of the box.

Create a BamFile with yieldSize and indicate you want mates.
bf <- BamFile(fl, yieldSize=10000, asMates=TRUE)

Maybe set some specifications in a param:
param <- ScanBamParam(what = c("qname", "flag"))

Then call either readGAlignment* method that handles pairs:
readGAlignmentsList(bf, param=param)
readGAlignmentPairs(bf, param=param)

For summarizeOverlaps():
summarizeOverlaps(annotation, bf, param=param, singleEnd=FALSE)

We've considered removing the 'obeyQname' arg and documentation but 
thought the concept may be useful in another application. I'll revisit 
the summarizeOverlaps() documentation to make sure 'obeyQname' is 
downplayed and 'asMates' is encouraged.

Valerie



On 03/19/14 07:39, Michael Love wrote:
> hi,
>
>>From last year, in order to use yieldSize with paired-end BAMs, I
> should sort the BAMs by qname and then use the following call to
> BamFile:
>
> library(pasillaBamSubset)
> fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE)
> bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE)
>
> https://stat.ethz.ch/pipermail/bioconductor/2013-March/051490.html
>
> If I want to use GenomicAlignments::readGAlignmentsList with
> asMates=TRUE and respecting the yieldSize, what is the proper
> construction? (in the end, I want to use summarizeOverlaps on
> paired-end BAMs while respecting the yieldSize)
>
> library(pasillaBamSubset)
> fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE)
> bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE, asMates=TRUE)
> x <- readGAlignmentsList(bf)
> Warning message:
>   In scanBam(bamfile, ..., param = param) :
>     'obeyQname=TRUE' ignored when 'asMates=TRUE'
>   Calls: readGAlignmentsList ... .matesFromBam ->
> .load_bamcols_from_bamfile -> scanBam -> scanBam
>
> I see in the man pages for summarizeOverlaps it has:
>
> "In Bioconductor > 2.12 it is not
> necessary to sort paired-end BAM files by ‘qname’. When
> counting with ‘summarizeOverlaps’, setting ‘singleEnd=FALSE’
> will trigger paired-end reading and counting."
>
> but I don't see how this can respect the specified yieldSize, because
> readGAlignmentsList has to read in as many reads as necessary to find
> the mate.
>
> Sorry in advance if I am missing something in the documentation!
>
> Mike
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list