[Bioc-devel] serializing pairwise alignment objects
Hahne, Florian
florian.hahne at novartis.com
Wed Nov 7 08:41:19 CET 2012
Great Herve,
thanks a lot!
Florian
--
On 11/7/12 3:06 AM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>Hi Florian,
>
>I just removed the 'substitutionArray' slot from PairwiseAlignments
>objects in Biostrings 2.27.7. The slot didn't seem to be used/needed
>by any downstream method.
>
> > packageVersion("Biostrings")
> [1] Œ2.27.7¹
> > x <- "xxxabcdefghijklmnopqyyy"
> > y <- "abcdhijkzzzzlmnpqr"
> > pa <- pairwiseAlignment(x, y)
> > slotNames(pa)
> [1] "pattern" "subject" "type" "score"
>"gapOpening"
> [6] "gapExtension"
> > validObject(pa)
> [1] TRUE
> > object.size(pa)
> 35528 bytes
>
>... instead of 35308996 bytes! 3 orders of magnitude smaller :-)
>
>Cheers,
>H.
>
>
>On 11/05/2012 03:45 AM, Hahne, Florian wrote:
>> Indeed. I did not look the far into the implementation, it just seemed
>>odd
>> to me that the objects got that inflated. scoreOnly is not really that
>> helpful if you want to deal with the actual alignments. The only
>> reasonable application I see for it is if you want to rank a bunch of
>> sequences by pairwise similarity. This gigantic memory footprint is
>>really
>> breaking things once you start doing a lot of these pairwise alignment
>> operations in parallel. mclapply complains about not being able to turn
>> such large objects into a raw vector, and serializing to disk quickly
>> fills your hard drive. You also loose a lot of the time gained by
>>parallel
>> processing just by writing and loading gigabytes of data...
>> I don't know enough about the internals of the PairwiseAlignments
>>classes,
>> but it seems that there must be a way to avoid having this huge array as
>> part of the object. As a quick and dirty fix for now I just replaced the
>> substitutionArray slot with an empty matrix and all the downstream
>> operations that I wanted to do still work. Would be great if you could
>> take a look into this, Herve.
>> Thanks,
>> Florian
>>
>
>--
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone: (206) 667-5791
>Fax: (206) 667-1319
More information about the Bioc-devel
mailing list