[Bioc-devel] serializing pairwise alignment objects

Hervé Pagès hpages at fhcrc.org
Wed Nov 7 03:06:19 CET 2012


Hi Florian,

I just removed the 'substitutionArray' slot from PairwiseAlignments
objects in Biostrings 2.27.7. The slot didn't seem to be used/needed
by any downstream method.

   > packageVersion("Biostrings")
   [1] ‘2.27.7’
   > x <- "xxxabcdefghijklmnopqyyy"
   > y <- "abcdhijkzzzzlmnpqr"
   > pa <- pairwiseAlignment(x, y)
   > slotNames(pa)
   [1] "pattern"      "subject"      "type"         "score" 
"gapOpening"
   [6] "gapExtension"
   > validObject(pa)
   [1] TRUE
   > object.size(pa)
   35528 bytes

... instead of 35308996 bytes! 3 orders of magnitude smaller :-)

Cheers,
H.


On 11/05/2012 03:45 AM, Hahne, Florian wrote:
> Indeed. I did not look the far into the implementation, it just seemed odd
> to me that the objects got that inflated. scoreOnly is not really that
> helpful if you want to deal with the actual alignments. The only
> reasonable application I see for it is if you want to rank a bunch of
> sequences by pairwise similarity. This gigantic memory footprint is really
> breaking things once you start doing a lot of these pairwise alignment
> operations in parallel. mclapply complains about not being able to turn
> such large objects into a raw vector, and serializing to disk quickly
> fills your hard drive. You also loose a lot of the time gained by parallel
> processing just by writing and loading gigabytes of data...
> I don't know enough about the internals of the PairwiseAlignments classes,
> but it seems that there must be a way to avoid having this huge array as
> part of the object. As a quick and dirty fix for now I just replaced the
> substitutionArray slot with an empty matrix and all the downstream
> operations that I wanted to do still work. Would be great if you could
> take a look into this, Herve.
> Thanks,
> Florian
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list