[Bioc-devel] serializing pairwise alignment objects
Hervé Pagès
hpages at fhcrc.org
Wed Nov 7 03:06:19 CET 2012
Hi Florian,
I just removed the 'substitutionArray' slot from PairwiseAlignments
objects in Biostrings 2.27.7. The slot didn't seem to be used/needed
by any downstream method.
> packageVersion("Biostrings")
[1] ‘2.27.7’
> x <- "xxxabcdefghijklmnopqyyy"
> y <- "abcdhijkzzzzlmnpqr"
> pa <- pairwiseAlignment(x, y)
> slotNames(pa)
[1] "pattern" "subject" "type" "score"
"gapOpening"
[6] "gapExtension"
> validObject(pa)
[1] TRUE
> object.size(pa)
35528 bytes
... instead of 35308996 bytes! 3 orders of magnitude smaller :-)
Cheers,
H.
On 11/05/2012 03:45 AM, Hahne, Florian wrote:
> Indeed. I did not look the far into the implementation, it just seemed odd
> to me that the objects got that inflated. scoreOnly is not really that
> helpful if you want to deal with the actual alignments. The only
> reasonable application I see for it is if you want to rank a bunch of
> sequences by pairwise similarity. This gigantic memory footprint is really
> breaking things once you start doing a lot of these pairwise alignment
> operations in parallel. mclapply complains about not being able to turn
> such large objects into a raw vector, and serializing to disk quickly
> fills your hard drive. You also loose a lot of the time gained by parallel
> processing just by writing and loading gigabytes of data...
> I don't know enough about the internals of the PairwiseAlignments classes,
> but it seems that there must be a way to avoid having this huge array as
> part of the object. As a quick and dirty fix for now I just replaced the
> substitutionArray slot with an empty matrix and all the downstream
> operations that I wanted to do still work. Would be great if you could
> take a look into this, Herve.
> Thanks,
> Florian
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list