[BioC] Rsamtools problem
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Mar 29 15:49:31 CEST 2011
Hi,
On Tue, Mar 29, 2011 at 6:27 AM, Vincenzo Capece <vivo0304 at gmail.com> wrote:
> dear all,
> i am a beginner and i need an information about bam header.
> The header of my bam file is:
[clipped this as its not really important]
>
> I want to use Dataframe function in this way:
> DataFrame with 6 rows and 5 columns
> rname strand pos qwidth
> <integer> <integer> <integer> <integer>
> 1 1 1 970 35
> 2 1 1 971 35
> 3 1 1 972 35
> 4 1 1 973 35
> 5 1 1 974 35
> 6 1 2 975 35
> seq
> <DNAStringSet>
> 1 TATTAGGAAATGCTTTACTGTCATAACTATGAAGA
> 2 ATTAGGAAATGCTTTACTGTCATAACTATGAAGAG
> 3 TTAGGAAATGCTTTACTGTCATAACTATGAAGAGA
> 4 TAGGAAATGCTTTACTGTCATAACTATGAAGAGAC
> 5 AGGAAATGCTTTACTGTCATAACTATGAAGAGACT
> 6 GGAAATGCTTTACTGTCATAACTATGAAGAGACTA
>
> If i launch:
> df <- do.call("DataFrame", bam)
How did you get `bam`?
This wouldn't work unless `bam` is essentially a list of Vector-like
values, and not a list of lists (as is the default that is returned
from scaBam.
> my result is:
>
> DataFrame with 0 rows and 5 columns
That looks about right:
> And my bam file is:
>
> $`chr1:1-249250621`
> $`chr1:1-249250621`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr1:1-249250621`$strand
> factor(0)
> Levels: + - *
>
> $`chr1:1-249250621`$pos
> integer(0)
>
> $`chr1:1-249250621`$qwidth
> integer(0)
>
> $`chr1:1-249250621`$seq
> A DNAStringSet instance of length 0
>
>
> $`chr2:1-243199373`
> $`chr2:1-243199373`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr2:1-243199373`$strand
> factor(0)
> Levels: + - *
>
> $`chr2:1-243199373`$pos
> integer(0)
>
> $`chr2:1-243199373`$qwidth
> integer(0)
>
> $`chr2:1-243199373`$seq
> A DNAStringSet instance of length 0
This isn't a "bam file", but rather looks to be a result that was
returned by scanBam, but look: the result of your query within the
range `chr2:1-243199373` is empty.
> What is my error?
It seems like a few things, but also hard to diagnose w/o more code from you.
The first thing you need to figure out is why your scanBam query isn't
returning any reads on chromosome 2.
> Where in header BAM file i can find the chr lenghts?
You've found it in your header -- sorry I cut it out, but these were
the relevant lines:
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
I have a small utility function that reads bam headers and pulls out
chromosome info. It uses the `scanBamHeader` function from Rsamtools:
parseChromosomeInfoFromBAM <- function(bam.file) {
result <- scanBamHeader(bam.file)[[1]]$targets
df <- data.frame(name=names(result), length=result)
rownames(df) <- NULL
df
}
Which returns a data.frame with the info you're after, like:
R> head(parseChromosomeInfoFromBAM('/path/to/file.bam'))
name length
1 chr1 249250621
2 chr2 243199373
3 chr3 198022430
4 chr4 191154276
5 chr5 180915260
6 chr6 171115067
Hope that helps,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list