[Bioc-sig-seq] transforming bam for TEQC
Martin Morgan
mtmorgan at fhcrc.org
Thu Jun 2 14:18:35 CEST 2011
On 06/02/2011 01:45 AM, David A. wrote:
> Dear Martin, thanks a lot for your suggestion, but I am getting an error
> with one of the samples. The other sample seems to load fine, so it
> could be that this one is too large. I haven't found information about
> this error, can you suggest something?
>
> > pars<-ScanBamParam(flag=scanBamFlag(isProperPair=TRUE),
> what=c("rname","strand","pos","qwidth","seq","isize"))
> > data7<-scanBam('/Data/run5/aligned/s_7.bam',param=pars)[[1]]
> Error in .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param
> = param) :
> too many nucleotides, use 'param=ScanBamParam(which=<...>)'
Hi Dave -- this is an issue with Rsamtools that is on my radar to
address. The problem is with the 'seq' argument, where the total number
of nucleotides exceeds the maximum integer R can represent (2^31 - 1).
The workaround is either to omit 'seq' or to read the data in chunks
(e.g., by chromosome, which="chr1") and concatenate (c(chr1rd, chr2rd);
probably you'd do listOfRangedData = lapply(chrs, function(chr, ...) {
<input chr to RangedData> }); rd = do.call(c, listOfRangedData).
Martin
>
> It is prompting for using 'which' argument, I guess to select a part of
> the file (alignment against hg19), but how can I deal with the BAM file
> if I want to load it complete and then calculate the overall mean insert
> size?
>
>
>
> Thanks,
>
> Dave
>
> > Date: Tue, 31 May 2011 17:12:22 -0700
> > From: mtmorgan at fhcrc.org
> > To: dasolexa at hotmail.com
> > CC: bioc-sig-sequencing at r-project.org
> > Subject: Re: [Bioc-sig-seq] transforming bam for TEQC
> >
> > On 05/31/2011 05:42 AM, David A. wrote:
> > >
> > > Hi, I would like to load my paired-end bam file for TEQC using the
> > > TEQC library. In the manual it says that the bed file needed for
> > > paired-end reads should contain read pair ID. How can I get this
> > > format? Some bam2bed converters I know only give the three main
> > > columns, and if I am not wrong the BEDPE format is too ample.
> >
> > Hi Dave -- I haven't used TEQC (looks good, though) but since its
> > get.reads function returns a RangedData object with mate pairs as
> > successive rows (from example(get.reads); reads) it seems like this
> > could be constructed directly from your bam file using
> > Rsamtools::scanBam and IRanges::RangedData. I think you'll start with
> > something like
> >
> > param <- ScanBamParam(flag=scanBamFlag(isProperPair=TRUE),
> > what=c("qname", "pos", "qwidth", "rname"))
> > aln = scanBam(fl, param=param)[[1]]
> > rd = with(aln, RangedData(IRanges(pos, width=qwidth), ID=qname,
> > space=rname))
> >
> > rd[order(rd$space, rd$ID)]
> >
> > Martin
> >
> > >
> > > Any help would be greatly appreciated
> > >
> > > Cheers,
> > >
> > > Dave [[alternative HTML version deleted]]
> > >
> > > _______________________________________________ Bioc-sig-sequencing
> > > mailing list Bioc-sig-sequencing at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
> >
> > --
> > Computational Biology
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> >
> > Location: M1-B861
> > Telephone: 206 667-2793
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list