[Bioc-sig-seq] large BAM files and large BED files
Rene Paradis
rene.paradis at genome.ulaval.ca
Mon Sep 19 20:26:46 CEST 2011
Thanks Martin and Michael for your constructive advices,
I used the ScanBamParam object to successfully load a part of the Chr1
from a Bam file via ScanBam. Honestly I do not know what are the
differences between readGappedAlignments, readBamGappedAlignment and
ScanBam. The last two of them can take a ScanBamParam object.
But I wished I could select the seqname in GRanges to retrieve all the
chr1 (as an example) data from the Bam file. It seems I must select a
range. So I put a value that goes beyond the range of the chr1 because I
do not know that range, and I got an <<INTEGER () can only be applied to
a 'integer', not a special>>. There must be something I missed that
could help me doing that.
ultimately, I want to launch a PICS analysis that requires a
segReadsList object.
Overall I definitely progressed by your help, thank you.
Rene
On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote:
> On 09/16/2011 02:11 PM, Michael Lawrence wrote:
> > It sounds like you're trying to use BED as an alternative to BAM? Probably
> > not a good idea, especially at this scale. Why are you aiming for a
> > GenomeData? A GappedAlignments might be more appropriate. See
> > GenomicRanges::readGappedAlignments() for bringing a BAM into a
> > GappedAlignments.
>
> Hi Rene
>
> the 'which' argument to readGappedAlignments (it'll become 'param' with
> the next release, and be a ScanBamParam object) allows you to select
> regions to process, e.g., chromosome-at-a-time, to help with file size.
>
> Martin
> >
> > This page might help:
> > http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources
> >
> > But it could really be improved.
> >
> > Michael
> >
> > On Fri, Sep 16, 2011 at 1:44 PM, Rene Paradis<rene.paradis at genome.ulaval.ca
> >> wrote:
> >
> >> Hello,
> >>
> >> I am experiencing a problem regarding the load in memory of bed files of
> >> 30 GB. my function read.table unleash the error : Error in unique(x) :
> >> length xxxxxx is too large for hashing.
> >>
> >> this is generated by the function MKsetup of the unique.c file. Even by
> >> increasing by 10 000x the value, the error persists. I believe the
> >> function pushes more data in ram, but I am not sure this is the good way
> >> to focus on.
> >>
> >> Ultimately, I would like to produce a GenomeData object from either a
> >> BAM file or a bed file.
> >>
> >> has someone ever worked with very very big BAM files (about 30 GB)
> >>
> >> thanks
> >>
> >> Rene paradis
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> Bioc-sig-sequencing at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > Bioc-sig-sequencing at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
More information about the Bioc-sig-sequencing
mailing list