[BioC] Sorting a GAlignments object by QNAME
Martin Morgan
mtmorgan at fhcrc.org
Sun Sep 29 22:21:03 CEST 2013
On 09/29/2013 12:32 PM, rubi [guest] wrote:
>
> Is there a way to sort the records in a GAlignments object by the QNAME, as
> this object is created with the readGAlignmentsFromBam function where the bam
> file and its corresponding index file must be sorted by RNAME and POS.
>
> Unless I'm missing something the only way I see how can this be done is read
> the bam into a data.table and sort that.
Unsorted / sorted by qname files can be read in; likely the part that is
tripping you up is the need to specify character() for index, perhaps with
yieldSize and obeyQname
bf = open(BamFile(fl, character(), yieldSize=1000000, obeyQname=TRUE))
If fl were sorted by qname (?sortBam, byQname=TRUE) then this would guarantee
1000000 qnames per chunk
repeat {
aln = readGAlignmentsFromBam(bf)
if (length(aln) == 0)
break
## do work
}
Since you've got the devel version, see also ?readGAlignmentsListFromBam which
will read in mated reads from an RNAME,POS supported file in an iteration like
above, with more modest memory requirements than reading in the entire file.
Martin
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5]
> LC_TIME=English_United States.1252
>
> attached base packages: [1] parallel stats graphics grDevices utils
> datasets methods base
>
> other attached packages: [1] doParallel_1.0.3 iterators_1.0.6
> foreach_1.4.1 data.table_1.8.10 Rsamtools_1.13.44
> Biostrings_2.29.19 GenomicRanges_1.13.45 XVector_0.1.4 [9] IRanges_1.19.38
> BiocGenerics_0.7.5
>
> loaded via a namespace (and not attached): [1] bitops_1.0-6
> codetools_0.2-8 stats4_3.0.2 tools_3.0.2 zlibbioc_1.7.0
>
> -- Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________ Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list