[Bioc-devel] serializing pairwise alignment objects

Hervé Pagès hpages at fhcrc.org
Fri Nov 2 19:02:36 CET 2012


Hi,

Looks like Benilton is right:

   > slotNames(pa)
   [1] "pattern"           "subject"           "type"
   [4] "score"             "substitutionArray" "gapOpening"
   [7] "gapExtension"
   > sapply(slotNames(pa), function(sname) object.size(slot(pa, sname)))
             pattern           subject              type             score
               17056             17056                96                48
   substitutionArray        gapOpening      gapExtension
            35295336                48                48

I'm not sure why the substitutionArray would need to be stored in the
returned object (what downstream method use it?). Would need to check.

H.


On 11/02/2012 09:41 AM, Benilton Carvalho wrote:
> Ditto.
>
> But isn't it just the result of the resulting object 'pa' containing the
> substitutionArray slot (100 x 100 x 441 array of doubles)? Maybe
> scoreOnly=TRUE is relevant in some cases?
>
> b
>
>
> On 2 November 2012 15:53, Wolfgang Huber <whuber at embl.de> wrote:
>
>> Hi,
>>
>> I can reproduce this on more recent versions of everything:
>>
>>> sessionInfo()
>> R Under development (unstable) (2012-10-31 r61057)
>> Platform: x86_64-apple-darwin12.2.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] Biostrings_2.27.5  IRanges_1.17.7     BiocGenerics_0.5.1 fortunes_1.5-0
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_2.16.0
>>
>> Best wishes
>>          Wolfgang
>>
>> Il giorno Nov 2, 2012, alle ore 9:32 AM, "Hahne, Florian" <
>> florian.hahne at novartis.com> ha scritto:
>>
>>> Hi all,
>>> I just realized that serialized PairwiseAlignmentsSingleSubject objects
>>> grow ridiculously large:
>>>
>>> x <- "xxxabcdefghijklmnopqyyy"
>>> y <- "abcdhijkzzzzlmnpqr"
>>> pa <- pairwiseAlignment(x,y)
>>> save(pa, file="~/tmp/pa.rda")
>>> file.info("~/tmp/pa.rda")
>>>                  size isdir mode               mtime               ctime
>>> ~/tmp/pa.rda 22651025 FALSE  644 2012-11-02 09:23:09 2012-11-02 09:23:09
>>>                            atime   uid   gid    uname   grname
>>> ~/tmp/pa.rda 2012-11-02 09:23:07 11281 11281 hahnefl1 hahnefl1
>>>
>>>
>>>
>>> 22 MB for this trivial alignment seems to be a little excessive.
>>>
>>> Interestingly, the object itself has a quite impressive memory footprint:
>>> object.size(pa)
>>> 35308996 bytes
>>>
>>>
>>> Any idea what is going on here? Look like a memory leak to me.
>>>
>>>
>>> Florian
>>>
>>> sessionInfo()
>>> R version 2.15.1 RC (2012-06-21 r59599)
>>> Platform: i386-apple-darwin11.4.0/i386 (32-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.26.2   IRanges_1.16.2      BiocGenerics_0.4.0
>>> [4] BiocInstaller_1.8.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] parallel_2.15.1 stats4_2.15.1   tools_2.15.1
>>>
>>>
>>>
>>> --
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list