[Bioc-sig-seq] Rsamtools

Martin Morgan mtmorgan at fhcrc.org
Thu Jan 28 02:26:28 CET 2010


On 01/27/2010 02:18 PM, Steve Lianoglou wrote:
> Hi Karl,
> 
> On Wed, Jan 27, 2010 at 5:00 PM, Dykema, Karl <Karl.Dykema at vai.org> wrote:
>> Hi all,
>>
>> We have been using Rsamtools to import BAM files also. On previous sequencing runs the BAM files were around 800 megs and I could import the entire BAM with .readAligned_bam(). Our new sequencing data creates larger BAM files and I am unable to import them like before. So, as Steve pointed out to me offline, I probably will need to import the reads from each chromosome individually. Unfortunately it does not seem to be recognizing my chromosome names, error msg below. Has anyone had a similar problem or recognize something that I might be doing incorrectly? Thanks in advance.
>>
>>
>>> which <- RangesList(chr1=IRanges(start=1,end=247249719))
>>> params <- ScanBamParam(which=which)
>>>
>>> chr1reads <- scanBam("../aln.sorted.nodupes.bam",param=params)
>> Error in function (bam, tmpl, space, start, end)  : failed to scan BAM
>>  file: �
>>  last record: 0
>> In addition: Warning message:
>> In function (bam, tmpl, space, start, end)  : 'space' not in BAM header
>>  file: ../aln.sorted.nodupes.bam
>>  space: chr1
> 
> Perhaps the chromosome names are different in your BAM file. I think
> you can list them using samtools, like so (from the command line):
> 
> $ samtools view -H aln.sorted.nodupes.bam

I think also the currently undocumented scanBamHeader

  bam = list.files(system.file("extdata", package="Rsamtools"),
                   "bam$",full=TRUE)
  scanBamHeader(bam)[[1]]$targets

which reports targets and numbers recorded in the header (the numbers
are NOT the number of reads in the file, but numbers inserted by
arbitrary upstream software, likewise the names reported in the header
might differ from the names actually present in the bam file).

seq1 seq2
1575 1584


Martin

> 
> You should get a lot of lines that start with @SQ -- the names of "the
> spaces" are there, some of the output from one of my files looks like:
> 
> @SQ	SN:chr1	LN:247249719
> @SQ	SN:chr2	LN:242951149
> @SQ	SN:chr3	LN:199501827
> ...
> 
> That's why this works for me:
> 
> which <- RangesList(chr1=IRanges(start=1,end=247249719))
> 
> Maybe your's might look like:
> 
> @SQ	SN:chromo1	LN:247249719
> @SQ	SN:chromo2	LN:242951149
> @SQ	SN:chromo3	LN:199501827
> 
> (or something)
> 
> Then the appropriate call for you would be:
> 
> which <- RangesList(chromo1=IRanges(start=1,end=247249719))
> 
> Does that work for you?
> 
> -steve
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list