[Bioc-sig-seq] Consolidate AlignedRead objects

Fri Aug 28 18:27:49 CEST 2009

Hello Martin and Everybody,

I tried your suggestion and it works nicely when the number of reads
is not so big.

Successful example:

if I have three instances, aln000, aln0550 and aln100 like this

> aln000
class: AlignedRead
length: 9465484 reads; width: 36 cycles
chromosome: chr11.fa chr13.fa ... chr6.fa chr6.fa
position: 100667123 52735524 ... 121341376 25134423
strand: + + ... + +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
> aln050
class: AlignedRead
length: 8918057 reads; width: 36 cycles
chromosome: chr5.fa chr15.fa ... chr16.fa chr8.fa
position: 149155914 57872637 ... 95751778 36611628
strand: + + ... + +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
> aln100
class: AlignedRead
length: 11261186 reads; width: 36 cycles
chromosome: chr4.fa chr5.fa ... chr10.fa chr1.fa
position: 66224960 140647218 ... 69579797 16009268
strand: + + ... + +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig

In can successfully apply the consolidating function:

> superDuperConsolidator <- function(...) Reduce(append, list(...))
> aln_000_100 <- superDuperConsolidator(aln000, aln050, aln100)

> aln_000_100
class: AlignedRead
length: 29644727 reads; width: 36 cycles
chromosome: chr11.fa chr13.fa ... chr10.fa chr1.fa
position: 100667123 52735524 ... 69579797 16009268
strand: + + ... + +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig

Not successful example:

Now I try to consolidate AlignedRead instances that are twice as big

> aln000
class: AlignedRead
length: 21845985 reads; width: 36 cycles
chromosome: chr17.fa chr1.fa ... chr18.fa chr9.fa
position: 41890422 142562489 ... 57003322 108499164
strand: - - ... - +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
> aln050
class: AlignedRead
length: 21961352 reads; width: 36 cycles
chromosome: chr18.fa chr16.fa ... chr15.fa chr9.fa
position: 88900833 22029306 ... 102993167 83200074
strand: - - ... + -
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
> aln100
class: AlignedRead
length: 20865366 reads; width: 36 cycles
chromosome: chr1.fa chr12.fa ... chr15.fa chr9.fa
position: 99986382 14243887 ... 93339870 75136974
strand: + - ... - +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig

> superDuperConsolidator <- function(...) Reduce(append, list(...))
> aln_000_100 <- superDuperConsolidator(aln000, aln050, aln100)
Error in .local(.Object, ...) :
  'length' must be a single non-negative integer
In addition: Warning message:
In width1 + width2 : NAs produced by integer overflow

I tried that with two different data sets; both failed. So, it is not
the data itself but the amount of data, I believe. The append()
function also fails when trying to consolidate two AlignedRead
instances, 50 million tags each.

Do you thing that I have reached a limit or is there a way to "grow"
AlignedRead instances slowly and gently?

By the way, I am using a server with very large memory now. So, memory
efficiency is far less important than successful consolidation.
sessionInfo() is the same.

Thank you,

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

On Thu, Aug 27, 2009 at 6:45 PM, Martin Morgan<mtmorgan at fhcrc.org> wrote:
> Hi Ivan --
>
> Ivan Gregoretti wrote:
>>
>> Hello everybody,
>>
>> Is there any memory efficient way to consolidate multiple AlignedRead
>> objects into one?
>>
>>
>> Example:
>>
>> Lets say that I have 10 AlignedRead instances, 10 million tags each.
>> Lets call those instances aln01 through aln10.
>>
>> I can consolidate two of them like this:
>>
>> aln <- append(aln01, aln02)
>
> I don't think there's anything built-in. You could try this
>
>  superDuperConsolidator <- function(...)
>     Reduce(append, list(...))
>
> it might not be too bad memory-wise.
>
> Martin
>
>>
>> Can I consolidate all AlignRead instances in a single shot? Something like
>> this:
>>
>> aln <- superDuperConsolidator(aln01, aln02, aln03, ..., aln10)
>>
>> Thank you,
>>
>> Ivan
>>
>> #########################################################
>>>
>>> sessionInfo()
>>
>> R version 2.10.0 Under development (unstable) (2009-08-12 r49169)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] ShortRead_1.3.27   lattice_0.17-25    BSgenome_1.13.10
>> Biostrings_2.13.34
>> [5] IRanges_1.3.60
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.5.5 grid_2.10.0   hwriter_1.1
>>
>> #########################################################
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>