[Bioc-sig-seq] Rsamtools

Martin Morgan mtmorgan at fhcrc.org
Sun Feb 7 15:19:39 CET 2010


On 01/28/2010 01:50 PM, Dykema, Karl wrote:
> Martin/Steve,
> 
> Thanks a lot guys, using the correct chromosome names fixed that
> problem. Instead of "chr1", I simply needed to use "1". Now I am able
> to import reads by looping through, but I am getting some strange
> results. In this whole-exon-capture dataset I would expect the
> largest chromosomes to have the highest number of reads but that does
> not appear to be what I am getting (highlighted below). With respect
> to file size the BAM appears to be within the range I would expect,
> but after importing into my AlignedRead object I should have roughly
> 100 times the number of reads I am getting now. Anyone see something
> I am doing incorrectly?

To close this thread, after some off-list discussion it seems like the
issue is upstream of Rsamtools. scanBam() with no params returns the
expected results, and scanBam with 'which' returns unusual results that
are nonetheless the same as the equivalent samtools command line result.

Martin

> 
> Thanks again!
> 
> -karl
> 
> 
> 
> 
> library(org.Hs.eg.db)
> 
> lens <- org.Hs.egCHRLENGTHS
> lens <- lens[c(1:22,"X","Y")]
> 
> i=1
> which <- RangesList(IRanges(start=1,end=lens[i]))
> names(which) <- names(lens)[i]
> params <- ScanBamParam(which=which)
> finAR <- .readAligned_bam("../aln.sorted.nodupes.bam",param=params)
> 
> for(i in c(2:length(lens))){
> 	
> 	which <- RangesList(IRanges(start=1,end=lens[i]))
> 	names(which) <- names(lens)[i]
> 	params <- ScanBamParam(which=which)
> 	
> 	finAR <- append(finAR,.readAligned_bam("../aln.sorted.nodupes.bam",param=params))
> }
> 
> 
>> sort(table(chromosome(finAR)),dec=T)[1:5]
> 
>     12     11      8      X      2
> 619386 281709 260369 217034 173985
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
> Sent: Wednesday, January 27, 2010 8:26 PM
> To: Steve Lianoglou
> Cc: Dykema, Karl; Bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] Rsamtools
> 
> On 01/27/2010 02:18 PM, Steve Lianoglou wrote:
>> Hi Karl,
>>
>> On Wed, Jan 27, 2010 at 5:00 PM, Dykema, Karl <Karl.Dykema at vai.org> wrote:
>>> Hi all,
>>>
>>> We have been using Rsamtools to import BAM files also. On previous sequencing runs the BAM files were around 800 megs and I could import the entire BAM with .readAligned_bam(). Our new sequencing data creates larger BAM files and I am unable to import them like before. So, as Steve pointed out to me offline, I probably will need to import the reads from each chromosome individually. Unfortunately it does not seem to be recognizing my chromosome names, error msg below. Has anyone had a similar problem or recognize something that I might be doing incorrectly? Thanks in advance.
>>>
>>>
>>>> which <- RangesList(chr1=IRanges(start=1,end=247249719))
>>>> params <- ScanBamParam(which=which)
>>>>
>>>> chr1reads <- scanBam("../aln.sorted.nodupes.bam",param=params)
>>> Error in function (bam, tmpl, space, start, end)  : failed to scan BAM
>>>  file: �
>>>  last record: 0
>>> In addition: Warning message:
>>> In function (bam, tmpl, space, start, end)  : 'space' not in BAM header
>>>  file: ../aln.sorted.nodupes.bam
>>>  space: chr1
>>
>> Perhaps the chromosome names are different in your BAM file. I think
>> you can list them using samtools, like so (from the command line):
>>
>> $ samtools view -H aln.sorted.nodupes.bam
> 
> I think also the currently undocumented scanBamHeader
> 
>   bam = list.files(system.file("extdata", package="Rsamtools"),
>                    "bam$",full=TRUE)
>   scanBamHeader(bam)[[1]]$targets
> 
> which reports targets and numbers recorded in the header (the numbers
> are NOT the number of reads in the file, but numbers inserted by
> arbitrary upstream software, likewise the names reported in the header
> might differ from the names actually present in the bam file).
> 
> seq1 seq2
> 1575 1584
> 
> 
> Martin
> 
> 
> The information transmitted is intended only for the p...{{dropped:12}}



More information about the Bioc-sig-sequencing mailing list