[Bioc-sig-seq] transforming bam for TEQC

Martin Morgan mtmorgan at fhcrc.org
Thu Jun 2 14:18:35 CEST 2011


On 06/02/2011 01:45 AM, David A. wrote:
> Dear Martin, thanks a lot for your suggestion, but I am getting an error
> with one of the samples. The other sample seems to load fine, so it
> could be that this one is too large. I haven't found information about
> this error, can you suggest something?
>
>  > pars<-ScanBamParam(flag=scanBamFlag(isProperPair=TRUE),
> what=c("rname","strand","pos","qwidth","seq","isize"))
>  > data7<-scanBam('/Data/run5/aligned/s_7.bam',param=pars)[[1]]
> Error in .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param
> = param) :
> too many nucleotides, use 'param=ScanBamParam(which=<...>)'

Hi Dave -- this is an issue with Rsamtools that is on my radar to 
address. The problem is with the 'seq' argument, where the total number 
of nucleotides exceeds the maximum integer R can represent (2^31 - 1). 
The workaround is either to omit 'seq' or to read the data in chunks 
(e.g., by chromosome, which="chr1") and concatenate (c(chr1rd, chr2rd); 
probably you'd do listOfRangedData = lapply(chrs, function(chr, ...) { 
<input chr to RangedData> }); rd = do.call(c, listOfRangedData).

Martin

>
> It is prompting for using 'which' argument, I guess to select a part of
> the file (alignment against hg19), but how can I deal with the BAM file
> if I want to load it complete and then calculate the overall mean insert
> size?
>
>
>
> Thanks,
>
> Dave
>
>  > Date: Tue, 31 May 2011 17:12:22 -0700
>  > From: mtmorgan at fhcrc.org
>  > To: dasolexa at hotmail.com
>  > CC: bioc-sig-sequencing at r-project.org
>  > Subject: Re: [Bioc-sig-seq] transforming bam for TEQC
>  >
>  > On 05/31/2011 05:42 AM, David A. wrote:
>  > >
>  > > Hi, I would like to load my paired-end bam file for TEQC using the
>  > > TEQC library. In the manual it says that the bed file needed for
>  > > paired-end reads should contain read pair ID. How can I get this
>  > > format? Some bam2bed converters I know only give the three main
>  > > columns, and if I am not wrong the BEDPE format is too ample.
>  >
>  > Hi Dave -- I haven't used TEQC (looks good, though) but since its
>  > get.reads function returns a RangedData object with mate pairs as
>  > successive rows (from example(get.reads); reads) it seems like this
>  > could be constructed directly from your bam file using
>  > Rsamtools::scanBam and IRanges::RangedData. I think you'll start with
>  > something like
>  >
>  > param <- ScanBamParam(flag=scanBamFlag(isProperPair=TRUE),
>  > what=c("qname", "pos", "qwidth", "rname"))
>  > aln = scanBam(fl, param=param)[[1]]
>  > rd = with(aln, RangedData(IRanges(pos, width=qwidth), ID=qname,
>  > space=rname))
>  >
>  > rd[order(rd$space, rd$ID)]
>  >
>  > Martin
>  >
>  > >
>  > > Any help would be greatly appreciated
>  > >
>  > > Cheers,
>  > >
>  > > Dave [[alternative HTML version deleted]]
>  > >
>  > > _______________________________________________ Bioc-sig-sequencing
>  > > mailing list Bioc-sig-sequencing at r-project.org
>  > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>  >
>  >
>  > --
>  > Computational Biology
>  > Fred Hutchinson Cancer Research Center
>  > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>  >
>  > Location: M1-B861
>  > Telephone: 206 667-2793


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list