[BioC] reading BAM files
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 29 00:35:52 CET 2012
On 11/27/2012 11:35 AM, Marcus Davy wrote:
> Hi Hermann,
> to clarify further, you can only index a sorted bam file, in Rsamtools there are
> two alternatives;
>
> 1. use asBam() which sorts and creates an index file in one go
> 2. use sortBam() and indexBam() to sort a bam file and then create an index
>
> Or run samtools from the bash prompt/command line to achieve the same result.
>
> samtools view -bS -o aln.bam aln.sam
> samtools sort aln.bam aln_sorted
> samtools index aln_sorted.bam
>
> Note, I can get a *segfault *error if sortBam() is run on a 'sam' file by
> accident, the error and sessionInfo() is at the bottom of this email (this may
> already be fixed in the development version).
In Rsamtools devel version 1.11.12, sortBam and indexBam no longer segfault when
run on non-BAM files.
Thanks for the bug report.
Martin
>
> Some pseudocode;
>
> ## unsorted bam file
> samName <- "aln.sam"
> bamName <- "aln.bam"
> sbamSuffix <- "aln_sorted" ## No suffix
> sbamName <- "aln_sorted.bam"
>
> ## 1) Sort and Index
> ## Sort into destination file 'aln_sorted.bam' using sortBam()
> sortBam(bamName, sbamSuffix)
> dir(pattern="bam")
>
> ## Index sorted bam file using indexBam()
> indexBam(sbamName)
> dir(pattern="bam")
>
> ## 2) asBam() sorted and creates index
> unlink("*bam*")
> asBam(samName, sbamSuffix)
> dir(pattern="bam")
>
> ## Working with a sorted bam file
> what <- scanBamWhat() ## subset as required
> ft <- scanBamHeader(sbamName)[[1]][["targets"]]
> which <- GRanges(names(ft), IRanges(1, ft)) ## subset as required
> param <- ScanBamParam(which=which, what=what)
> bam <- scanBam(sbamName, param = param)
>
>
> ## Note if you try to sortBam a 'sam' file Rsamtools appears to segfault
> sessionInfo()
> sortBam(samName, sbamSuffix)
>
> Marcus
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BiocInstaller_1.8.3 Rsamtools_1.10.2 Biostrings_2.26.2
> [4] GenomicRanges_1.10.2 IRanges_1.16.2 BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-4.1 parallel_2.15.0 stats4_2.15.0 tools_2.15.0
> [5] zlibbioc_1.4.0
>> sortBam(samName, sbamSuffix)
> *** caught segfault ***
> address 0x30, cause 'memory not mapped'
>
> Traceback:
> 1: .Call(.sort_bam, file, destination, byQname, as.integer(maxMemory))
> 2: .local(file, destination, ...)
> 3: sortBam(samName, sbamSuffix)
> 4: sortBam(samName, sbamSuffix)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
>
> On Tue, Nov 27, 2012 at 6:32 AM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Hermann,
>
>
> On 11/26/2012 12:26 PM, Hermann Norpois wrote:
>
> Hello,
>
> I try to read BAM files - so far without success. And I dont know why.
>
> My Bamfile is wgEncodeOpenChromDnase8988tAln__Rep1.bam
> (downloaded from the Encode site)
>
> For example:
>
> library (ShortRead)
>
> which<- RangesList(seq1=IRanges(1000,__2000),
>
> + seq2=IRanges(c(100,1000), c(1000,2000)))
>
> what<- c("rname", "strand", "pos", "qwidth","seq")
> param<- ScanBamParam(which=which, what=what)
> bam<- scanBam ("~/Bam/__wgEncodeOpenChromDnase8988tAln__Rep1.bam",
>
> param=param)
> [bam_index_load] fail to load BAM index.
> Fehler in open.BamFile(BamFile(file, index), "rb") :
> failed to load BAM index
> file: /home/hnorpois/Bam/__wgEncodeOpenChromDnase8988tAln__Rep1.bam
>
>
> The mentioned syntax I copied from an introduction to Rsamtools.
> Actually I would like to know a simple syntax that enables me to read
> BAM-files. Can anybody give me a hint.
>
>
> The hint is in the error message above - 'failed to load BAM index', which
> indicates that you have not yet indexed your bam file. You need to first use
> indexBam(), then you should be able to read in.
>
> Best,
>
> Jim
>
>
>
>
> Thanks Hermann
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list