[Bioc-devel] Rsamtools: How to import only the first 1000 or so reads

Michael Lawrence lawrence.michael at gene.com
Thu Jan 21 14:56:07 CET 2016


Copy and pasted directly from inside gmapR:

guessReadLengthFromBam <- function(x, n=100L) {
  ga <- readGAlignments(BamFile(x, yieldSize=n))
  readlen <- unique(qwidth(ga))
  if (length(readlen) != 1L)
    NA_integer_
  else readlen
}


On Thu, Jan 21, 2016 at 5:31 AM, Christian Arnold
<christian.arnold at embl.de> wrote:
> Dear Bioc-Community,
>
> I was wondering if it is currently possible to only import the first X reads
> rather than all reads from an arbitrary BAM file into a list using Rsamtools
> with scanBam? I did not find any parameter in scanBamParam that seems to
> capture what I need. Specifically, I do not want to specify genomic regions
> because for arbitrary BAM files, there might not be any read overlapping.
> Command-line solutions are also not an option here because I want to
> integrate this into the SNPhood package and it should run on Windows too.
>
> The reason for this task is that I want to automatically determine the read
> length from a BAM file, and my currently favoured strategy is to import the
> first 1000 reads or so and then determine the maximum length of the
> sequence, which should be a good proxy for the real read length.
>
> Thanks for suggestions,
> Christian
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list