[BioC] Rsamtools problem
Martin Morgan
mtmorgan at fhcrc.org
Tue Mar 29 15:48:23 CEST 2011
On 03/29/2011 03:27 AM, Vincenzo Capece wrote:
> dear all,
> i am a beginner and i need an information about bam header.
> The header of my bam file is:
>
> @SQ SN:chr1 LN:249250621
> @SQ SN:chr2 LN:243199373
> @SQ SN:chr3 LN:198022430
> @SQ SN:chr4 LN:191154276
> @SQ SN:chr5 LN:180915260
> @SQ SN:chr6 LN:171115067
> @SQ SN:chr7 LN:159138663
> @SQ SN:chrX LN:155270560
> @SQ SN:chr8 LN:146364022
> @SQ SN:chr9 LN:141213431
> @SQ SN:chr10 LN:135534747
> @SQ SN:chr11 LN:135006516
> @SQ SN:chr12 LN:133851895
> @SQ SN:chr13 LN:115169878
> @SQ SN:chr14 LN:107349540
> @SQ SN:chr15 LN:102531392
> @SQ SN:chr16 LN:90354753
> @SQ SN:chr17 LN:81195210
> @SQ SN:chr18 LN:78077248
> @SQ SN:chr20 LN:63025520
> @SQ SN:chrY LN:59373566
> @SQ SN:chr19 LN:59128983
> @SQ SN:chr22 LN:51304566
> @SQ SN:chr21 LN:48129895
> @SQ SN:chr6_ssto_hap7 LN:4928567
> @SQ SN:chr6_mcf_hap5 LN:4833398
> @SQ SN:chr6_cox_hap2 LN:4795371
> @SQ SN:chr6_mann_hap4 LN:4683263
> @SQ SN:chr6_apd_hap1 LN:4622290
> @SQ SN:chr6_qbl_hap6 LN:4611984
> @SQ SN:chr6_dbb_hap3 LN:4610396
> @SQ SN:chr17_ctg5_hap1 LN:1680828
> @SQ SN:chr4_ctg9_hap1 LN:590426
> @SQ SN:chr1_gl000192_random LN:547496
> @SQ SN:chrUn_gl000225 LN:211173
> @SQ SN:chr4_gl000194_random LN:191469
> @SQ SN:chr4_gl000193_random LN:189789
> @SQ SN:chr9_gl000200_random LN:187035
> @SQ SN:chrUn_gl000222 LN:186861
> @SQ SN:chrUn_gl000212 LN:186858
> @SQ SN:chr7_gl000195_random LN:182896
> @SQ SN:chrUn_gl000223 LN:180455
> @SQ SN:chrUn_gl000224 LN:179693
> @SQ SN:chrUn_gl000219 LN:179198
> @SQ SN:chr17_gl000205_random LN:174588
> @SQ SN:chrUn_gl000215 LN:172545
> @SQ SN:chrUn_gl000216 LN:172294
> @SQ SN:chrUn_gl000217 LN:172149
> @SQ SN:chr9_gl000199_random LN:169874
> @SQ SN:chrUn_gl000211 LN:166566
> @SQ SN:chrUn_gl000213 LN:164239
> @SQ SN:chrUn_gl000220 LN:161802
> @SQ SN:chrUn_gl000218 LN:161147
> @SQ SN:chr19_gl000209_random LN:159169
> @SQ SN:chrUn_gl000221 LN:155397
> @SQ SN:chrUn_gl000214 LN:137718
> @SQ SN:chrUn_gl000228 LN:129120
> @SQ SN:chrUn_gl000227 LN:128374
> @SQ SN:chr1_gl000191_random LN:106433
> @SQ SN:chr19_gl000208_random LN:92689
> @SQ SN:chr9_gl000198_random LN:90085
> @SQ SN:chr17_gl000204_random LN:81310
> @SQ SN:chrUn_gl000233 LN:45941
> @SQ SN:chrUn_gl000237 LN:45867
> @SQ SN:chrUn_gl000230 LN:43691
> @SQ SN:chrUn_gl000242 LN:43523
> @SQ SN:chrUn_gl000243 LN:43341
> @SQ SN:chrUn_gl000241 LN:42152
> @SQ SN:chrUn_gl000236 LN:41934
> @SQ SN:chrUn_gl000240 LN:41933
> @SQ SN:chr17_gl000206_random LN:41001
> @SQ SN:chrUn_gl000232 LN:40652
> @SQ SN:chrUn_gl000234 LN:40531
> @SQ SN:chr11_gl000202_random LN:40103
> @SQ SN:chrUn_gl000238 LN:39939
> @SQ SN:chrUn_gl000244 LN:39929
> @SQ SN:chrUn_gl000248 LN:39786
> @SQ SN:chr8_gl000196_random LN:38914
> @SQ SN:chrUn_gl000249 LN:38502
> @SQ SN:chrUn_gl000246 LN:38154
> @SQ SN:chr17_gl000203_random LN:37498
> @SQ SN:chr8_gl000197_random LN:37175
> @SQ SN:chrUn_gl000245 LN:36651
> @SQ SN:chrUn_gl000247 LN:36422
> @SQ SN:chr9_gl000201_random LN:36148
> @SQ SN:chrUn_gl000235 LN:34474
> @SQ SN:chrUn_gl000239 LN:33824
> @SQ SN:chr21_gl000210_random LN:27682
> @SQ SN:chrUn_gl000231 LN:27386
> @SQ SN:chrUn_gl000229 LN:19913
> @SQ SN:chrM LN:16571
> @SQ SN:chrUn_gl000226 LN:15008
> @SQ SN:chr18_gl000207_random LN:4262
> @PG ID:bfast VN:0.6.4e
>
> I want to use Dataframe function in this way:
> DataFrame with 6 rows and 5 columns
> rname strand pos qwidth
> <integer> <integer> <integer> <integer>
> 1 1 1 970 35
> 2 1 1 971 35
> 3 1 1 972 35
> 4 1 1 973 35
> 5 1 1 974 35
> 6 1 2 975 35
> seq
> <DNAStringSet>
> 1 TATTAGGAAATGCTTTACTGTCATAACTATGAAGA
> 2 ATTAGGAAATGCTTTACTGTCATAACTATGAAGAG
> 3 TTAGGAAATGCTTTACTGTCATAACTATGAAGAGA
> 4 TAGGAAATGCTTTACTGTCATAACTATGAAGAGAC
> 5 AGGAAATGCTTTACTGTCATAACTATGAAGAGACT
> 6 GGAAATGCTTTACTGTCATAACTATGAAGAGACTA
>
> If i launch:
> df<- do.call("DataFrame", bam)
> my result is:
>
> DataFrame with 0 rows and 5 columns
>
> And my bam file is:
>
> $`chr1:1-249250621`
> $`chr1:1-249250621`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr1:1-249250621`$strand
> factor(0)
> Levels: + - *
>
> $`chr1:1-249250621`$pos
> integer(0)
>
> $`chr1:1-249250621`$qwidth
> integer(0)
>
> $`chr1:1-249250621`$seq
> A DNAStringSet instance of length 0
>
>
> $`chr2:1-243199373`
> $`chr2:1-243199373`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr2:1-243199373`$strand
> factor(0)
> Levels: + - *
>
> $`chr2:1-243199373`$pos
> integer(0)
>
> $`chr2:1-243199373`$qwidth
> integer(0)
>
> $`chr2:1-243199373`$seq
> A DNAStringSet instance of length 0
>
> What is my error?
> Where in header BAM file i can find the chr lenghts?
Hi --
after example(scanBam), there is variable 'fl' pointing to a bam file.
Let's use that.
t = scanBamHeader(fl)[[1]][["targets"]]
provides a vector of sequences and lengths from the header file. These
are just the values reported in the BAM file; they do not mean that
reads from those sequences are present.
To find out how many reads are in the file, maybe
param = ScanBamParam(which=GRanges(names(t), IRanges(1, t)))
countBam(fl, param=param)
to query a particular chromosome for the reads, maybe
chr = "seq2"
param = ScanBamParam(
what=c("rname", "strand", "pos", "qwidth", "seq"),
which=GRanges(chr, IRanges(1, t[chr])))
bam = scanBam(fl, param=param)
DataFrame(bam[[1]])
Martin
>
> Thanks a lot.
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list