[BioC] Rsamtools problem

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Mar 29 15:49:31 CEST 2011


Hi,

On Tue, Mar 29, 2011 at 6:27 AM, Vincenzo Capece <vivo0304 at gmail.com> wrote:
> dear all,
> i am a beginner and i need an information about bam header.
> The header of my bam file is:
[clipped this as its not really important]
>
> I want to use Dataframe function in this way:
> DataFrame with 6 rows and 5 columns
> rname strand pos qwidth
> <integer> <integer> <integer> <integer>
> 1 1 1 970 35
> 2 1 1 971 35
> 3 1 1 972 35
> 4 1 1 973 35
> 5 1 1 974 35
> 6 1 2 975 35
> seq
> <DNAStringSet>
> 1 TATTAGGAAATGCTTTACTGTCATAACTATGAAGA
> 2 ATTAGGAAATGCTTTACTGTCATAACTATGAAGAG
> 3 TTAGGAAATGCTTTACTGTCATAACTATGAAGAGA
> 4 TAGGAAATGCTTTACTGTCATAACTATGAAGAGAC
> 5 AGGAAATGCTTTACTGTCATAACTATGAAGAGACT
> 6 GGAAATGCTTTACTGTCATAACTATGAAGAGACTA
>
> If i launch:
> df <- do.call("DataFrame", bam)

How did you get `bam`?

This wouldn't work unless `bam` is essentially a list of Vector-like
values, and not a list of lists (as is the default that is returned
from scaBam.

> my result is:
>
> DataFrame with 0 rows and 5 columns

That looks about right:

> And my bam file is:
>
> $`chr1:1-249250621`
> $`chr1:1-249250621`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr1:1-249250621`$strand
> factor(0)
> Levels: + - *
>
> $`chr1:1-249250621`$pos
> integer(0)
>
> $`chr1:1-249250621`$qwidth
> integer(0)
>
> $`chr1:1-249250621`$seq
>  A DNAStringSet instance of length 0
>
>
> $`chr2:1-243199373`
> $`chr2:1-243199373`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr2:1-243199373`$strand
> factor(0)
> Levels: + - *
>
> $`chr2:1-243199373`$pos
> integer(0)
>
> $`chr2:1-243199373`$qwidth
> integer(0)
>
> $`chr2:1-243199373`$seq
>  A DNAStringSet instance of length 0

This isn't a "bam file", but rather looks to be a result that was
returned by scanBam, but look: the result of your query within the
range `chr2:1-243199373` is empty.

> What is my error?

It seems like a few things, but also hard to diagnose w/o more code from you.

The first thing you need to figure out is why your scanBam query isn't
returning any reads on chromosome 2.

> Where in header BAM file i can find the chr lenghts?

You've found it in your header -- sorry I cut it out, but these were
the relevant lines:

@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067

I have a small utility function that reads bam headers and pulls out
chromosome info. It uses the `scanBamHeader` function from Rsamtools:

parseChromosomeInfoFromBAM <- function(bam.file) {
  result <- scanBamHeader(bam.file)[[1]]$targets
  df <- data.frame(name=names(result), length=result)
  rownames(df) <- NULL
  df
}

Which returns a data.frame with the info you're after, like:

R> head(parseChromosomeInfoFromBAM('/path/to/file.bam'))
  name    length
1 chr1 249250621
2 chr2 243199373
3 chr3 198022430
4 chr4 191154276
5 chr5 180915260
6 chr6 171115067

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list