[BioC] Rsamtools bam to ShortReadQ for export

Martin Morgan mtmorgan at fhcrc.org
Tue Nov 27 23:26:23 CET 2012


On 11/27/2012 01:17 PM, Marcus Davy wrote:
> Hi,
> I would like to export certain reads filtered using Rsamtools, for example
> aligning against contaminant references
> to exclude a set of sequences filtering on the bitwise flag=0x4 (similar to
> SamToFastq.jar in the picard http://picard.sourceforge.net/ tools),
> but couldn't find much on exporting/writing to file in the
> Rsamtools-Overview.pdf vignette or on google.
>
> Is there a function/utility which converts an Rsamtools bam list region to
> ShortReadQ object(s) for export using writeFastq()
> with single end *and* paired end alignments which have been read in via the
> Rsamtools interface?
>
> ## Load a bam
> bam <- scanBam(...)
>
> ## Filter using R's subsetting,
> bamRegion <- bam[[1]][ ... ] ## list of list
>
> ## Should work for single end alignments, but not paired end alignments
> ## Read pairs may have to index on mpos = PNEXT field
> rfq <- ShortReadQ(bamRegion[["seq"]], bamRegion[["qual"]],
> BStringSet(bamRegion[["qname"]]))
>
> If not is the area of exporting from Rsamtools worth developing futher?

I think I would use readBamGappedAlignments, specifying what=c("seq", "qual") 
and maybe "qname", flag=scanBamFlag(<...>) and perhaps the tag argument to 
filter / load relevant records, and then create a fastq object ShortReadQ(<...>) 
and writeFastq.

This could be made space efficient using BamFile(, yieldSize=) and iterating through

   bf = open(BamFile(...))
   out = FastqFile("..."
   while (nrow(ga <- readBamGappedAlignments(bf, ...))) {
      ## filter, create fq = ShortReadQ
      writeFastq(fq, out, mode ="a")
   }
   close(bf)
   close(out)

>
> cheers,
>
> Marcus
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list