[Bioc-sig-seq] large BAM files and large BED files

Mon Sep 19 20:26:46 CEST 2011

Thanks Martin and Michael for your constructive advices,

I used the ScanBamParam object to successfully load a part of the Chr1
from a Bam file via ScanBam. Honestly I do not know what are the
differences between readGappedAlignments, readBamGappedAlignment and
ScanBam. The last two of them can take a  ScanBamParam object.

But I wished I could select the seqname in GRanges to retrieve all the
chr1 (as an example) data from the Bam file. It seems I must select a
range. So I put a value that goes beyond the range of the chr1 because I
do not know that range, and I got an <<INTEGER () can only be applied to
a 'integer', not a special>>. There must be something I missed that
could help me doing that.

ultimately, I want to launch a PICS analysis that requires a
segReadsList object.

Overall I definitely progressed by your help, thank you.

Rene

On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote:
> On 09/16/2011 02:11 PM, Michael Lawrence wrote:
> > It sounds like you're trying to use BED as an alternative to BAM? Probably
> > not a good idea, especially at this scale. Why are you aiming for a
> > GenomeData? A GappedAlignments might be more appropriate. See
> > GenomicRanges::readGappedAlignments() for bringing a BAM into a
> > GappedAlignments.
> 
> Hi Rene
> 
> the 'which' argument to readGappedAlignments (it'll become 'param' with 
> the next release, and be a ScanBamParam object) allows you to select 
> regions to process, e.g., chromosome-at-a-time, to help with file size.
> 
> Martin
> >
> > This page might help:
> > http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources
> >
> > But it could really be improved.
> >
> > Michael
> >
> > On Fri, Sep 16, 2011 at 1:44 PM, Rene Paradis<rene.paradis at genome.ulaval.ca
> >> wrote:
> >
> >> Hello,
> >>
> >> I am experiencing a problem regarding the load in memory of bed files of
> >> 30 GB. my function read.table unleash the error : Error in unique(x) :
> >> length xxxxxx is too large for hashing.
> >>
> >> this is generated by the function MKsetup of the unique.c file. Even by
> >> increasing by 10 000x the value, the error persists. I believe the
> >> function pushes more data in ram, but I am not sure this is the good way
> >> to focus on.
> >>
> >> Ultimately, I would like to produce a GenomeData object from either a
> >> BAM file or a bed file.
> >>
> >> has someone ever worked with very very big BAM files (about 30 GB)
> >>
> >> thanks
> >>
> >> Rene paradis
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> Bioc-sig-sequencing at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >>
> >
> > 	[[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > Bioc-sig-sequencing at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
>