[BioC] filterBam
rcaloger
raffaele.calogero at gmail.com
Tue Jul 16 08:19:27 CEST 2013
Hi,
I am trying to filter a bam file using a set of coordinates.
I sorted the bam file using both sortBam or Picard SortSam tools but I
always get the same error:
filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons",
param=ScanBamParam(which=exons.gr))
Error in
FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]],
:
failed to build index
file:
/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons
In addition: Warning messages:
1: In
FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]],
:
[bam_index_core] the alignment is not sorted
(D44TDFP1_1:1:1102:6940:117318): 85079683 > 84952092 in 1-th chr
2: In
FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]],
:
[bam_index_build2] fail to index the BAM file.
Any suggestion how to handle this problem?
cheers
Raf
exons.gr
GRanges with 3399 ranges and 1 metadata column:
seqnames ranges strand | transcriptID
<Rle> <IRanges> <Rle> | <character>
[1] chr1 [ 4764598, 4766882] * | uc007aff.2
[2] chr1 [ 7110275, 7110696] * | uc007agb.1
[3] chr1 [ 9858548, 9858769] * | uc007agw.1
[4] chr1 [10029961, 10029968] * | uc007ahk.1
... ... ... ... ... ...
[3397] chrX [147525139, 147525147] * | uc012hqq.1
[3398] chrX [148034128, 148034929] * | uc009upe.1
[3399] chrX [156357850, 156359017] * | uc009uss.2
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8
chr9 chrX
NA NA NA NA NA NA ... NA NA NA NA NA NA
sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7
[4] IRanges_1.16.6 BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0
-- ----------------------------------------
Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro
di Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39
0116706457 Fax ++39 0112366457 Mobile ++39 3333827080 email:
raffaele.calogero at unito.it raffaele[dot]calogero[at]gmail[dot]com www:
http://www.bioinformatica.unito.it
More information about the Bioconductor
mailing list