[BioC] filterBam
rcaloger
raffaele.calogero at unito.it
Tue Jul 16 15:20:40 CEST 2013
Hi Martin,
thanks again for the quick response.
I managed to build the filtered bam file:
filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons",
indexDestination=FALSE, param=ScanBamParam(which=exons.gr))
sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted")
indexBam("accepted_hits.tp_exons.sorted.bam")
Cheers
raf
On 7/16/13 3:04 PM, Martin Morgan wrote:
> On 07/16/2013 05:01 AM, rcaloger wrote:
>> Dear Martin,
>> many thanks for the information.
>>
>> I got the point:
>> in exons.gr:
>> seqlevels(exons.gr)
>> [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16"
>> "chr17"
>> [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7"
>> "chr8"
>> [19] "chr9" "chrX"
>>
>> seqlevels(BamFile("accepted_hits.sorted.bam")
>> [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16"
>> "chr17"
>> [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7"
>> "chr8"
>> [19] "chr9" "chrM" "chrX" "chrY"
>>
>> "chrM" is missing in exons.gr
>>
>> There is any simple way to add to exons.gr also the chrM?
>
> Thanks for the file; the missing 'chrM' should not matter, and the
> ranges look like they are sorted to me. Maybe a workaround is to use
> the indexDestination=FALSE argument to filterBam, and then sort and
> index the filtered results
>
> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons",
> indexDestination=FALSE, param=ScanBamParam(which=exons.gr))
> sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted")
> indexBam("accepted_hits.tp_exons.sorted")
>
> Martin
>
>> you can get the test.Rda at :
>> https://mail.unito.it/mailftp/uplister/get_file.php?file=ZmlsZXMvRjEvNDAvZDBjMGExMTk0OTkyYWM3YzY4Nzg0MTFlYjc1My80MzA0ZWU0MzA1OGMzYTBhMmQ1NWZjMWZjNDM4M2M5ZQ==
>>
>> Many thanks !
>> Raf
>>
>> the file I used to generate exon.gr was ordered by chr and then by
>> start position
>>
>> On 7/16/13 1:25 PM, Martin Morgan wrote:
>>> On 07/15/2013 11:19 PM, rcaloger wrote:
>>>> Hi,
>>>> I am trying to filter a bam file using a set of coordinates.
>>>> I sorted the bam file using both sortBam or Picard SortSam tools
>>>> but I always
>>>> get the same error:
>>>>
>>>> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons",
>>>> param=ScanBamParam(which=exons.gr))
>>>>
>>>> Error in
>>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]],
>>>>
>>>>
>>>> :
>>>> failed to build index
>>>> file:
>>>> /data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons
>>>>
>>>>
>>>>
>>>> In addition: Warning messages:
>>>> 1: In
>>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]],
>>>>
>>>>
>>>> :
>>>> [bam_index_core] the alignment is not sorted
>>>> (D44TDFP1_1:1:1102:6940:117318):
>>>> 85079683 > 84952092 in 1-th chr
>>>> 2: In
>>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]],
>>>>
>>>>
>>>> :
>>>> [bam_index_build2] fail to index the BAM file.
>>>
>>> filterBam tries to place the GRanges into correct order by calling
>>>
>>> bf <- BamFile("accepted_hits.sorted.bam")
>>> xx <- Rsamtools:::.filterBam_preprocess(bf, param)
>>>
>>> after which
>>>
>>> bamWhich(xx)
>>>
>>> is supposed to have (a) non-overlapping ranges with (b) seqnames in
>>> the same
>>> order as in the original BAM file and (c) ordered by start position.
>>>
>>> I'm supposing that this fails for your exons.gr, and would be happy to
>>> trouble-shoot if you're willing to share (off-list is fine) the
>>> result of
>>>
>>> save(exons.gr, seqlevels(BamFile("accepted_hits.sorted.bam")),
>>> file="test.Rda")
>>>
>>> Martin
>>>
>>>>
>>>>
>>>> Any suggestion how to handle this problem?
>>>> cheers
>>>> Raf
>>>>
>>>> exons.gr
>>>> GRanges with 3399 ranges and 1 metadata column:
>>>> seqnames ranges strand | transcriptID
>>>> <Rle> <IRanges> <Rle> | <character>
>>>> [1] chr1 [ 4764598, 4766882] * | uc007aff.2
>>>> [2] chr1 [ 7110275, 7110696] * | uc007agb.1
>>>> [3] chr1 [ 9858548, 9858769] * | uc007agw.1
>>>> [4] chr1 [10029961, 10029968] * | uc007ahk.1
>>>> ... ... ... ... ... ...
>>>> [3397] chrX [147525139, 147525147] * | uc012hqq.1
>>>> [3398] chrX [148034128, 148034929] * | uc009upe.1
>>>> [3399] chrX [156357850, 156359017] * | uc009uss.2
>>>> ---
>>>> seqlengths:
>>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8
>>>> chr9 chrX
>>>> NA NA NA NA NA NA ... NA NA NA NA
>>>> NA NA
>>>>
>>>> sessionInfo()
>>>> R version 2.15.3 (2013-03-01)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=C LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7
>>>> [4] IRanges_1.16.6 BiocGenerics_0.4.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0
>>>>
>>>> -- ----------------------------------------
>>>> Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC
>>>> Centro di
>>>> Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39
>>>> 0116706457 Fax
>>>> ++39 0112366457 Mobile ++39 3333827080 email:
>>>> raffaele.calogero at unito.it
>>>> raffaele[dot]calogero[at]gmail[dot]com www:
>>>> http://www.bioinformatica.unito.it
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>>
>> --
>>
>> ----------------------------------------
>> Prof. Raffaele A. Calogero
>> Bioinformatics and Genomics Unit
>> MBC Centro di Biotecnologie Molecolari
>> Via Nizza 52, Torino 10126
>> tel. ++39 0116706457
>> Fax ++39 0112366457
>> Mobile ++39 3333827080
>> email:raffaele.calogero at unito.it
>> raffaele[dot]calogero[at]gmail[dot]com
>> www:http://www.bioinformatica.unito.it
>>
>
>
--
----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
tel. ++39 0116706457
Fax ++39 0112366457
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
raffaele[dot]calogero[at]gmail[dot]com
www: http://www.bioinformatica.unito.it
More information about the Bioconductor
mailing list