[BioC] Rsamtools bam to ShortReadQ for export
Martin Morgan
mtmorgan at fhcrc.org
Tue Nov 27 23:26:23 CET 2012
On 11/27/2012 01:17 PM, Marcus Davy wrote:
> Hi,
> I would like to export certain reads filtered using Rsamtools, for example
> aligning against contaminant references
> to exclude a set of sequences filtering on the bitwise flag=0x4 (similar to
> SamToFastq.jar in the picard http://picard.sourceforge.net/ tools),
> but couldn't find much on exporting/writing to file in the
> Rsamtools-Overview.pdf vignette or on google.
>
> Is there a function/utility which converts an Rsamtools bam list region to
> ShortReadQ object(s) for export using writeFastq()
> with single end *and* paired end alignments which have been read in via the
> Rsamtools interface?
>
> ## Load a bam
> bam <- scanBam(...)
>
> ## Filter using R's subsetting,
> bamRegion <- bam[[1]][ ... ] ## list of list
>
> ## Should work for single end alignments, but not paired end alignments
> ## Read pairs may have to index on mpos = PNEXT field
> rfq <- ShortReadQ(bamRegion[["seq"]], bamRegion[["qual"]],
> BStringSet(bamRegion[["qname"]]))
>
> If not is the area of exporting from Rsamtools worth developing futher?
I think I would use readBamGappedAlignments, specifying what=c("seq", "qual")
and maybe "qname", flag=scanBamFlag(<...>) and perhaps the tag argument to
filter / load relevant records, and then create a fastq object ShortReadQ(<...>)
and writeFastq.
This could be made space efficient using BamFile(, yieldSize=) and iterating through
bf = open(BamFile(...))
out = FastqFile("..."
while (nrow(ga <- readBamGappedAlignments(bf, ...))) {
## filter, create fq = ShortReadQ
writeFastq(fq, out, mode ="a")
}
close(bf)
close(out)
>
> cheers,
>
> Marcus
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list