[BioC] Why is *ply-ing over a GRangesList much slower than *ply-ing over an IRangesList?
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Oct 15 06:10:59 CEST 2010
On Thu, Oct 14, 2010 at 11:07 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 10/14/2010 04:04 PM, Steve Lianoglou wrote:
>> On Thu, Oct 14, 2010 at 5:55 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> <snip>
>>> As an update, Patrick has improved performance 10x-ish in IRanges
>>> 1.7.40, still some more to go...
>>>
>>>> replicate(5, system.time(lapply(xcripts, length)))
>>> [,1] [,2] [,3] [,4] [,5]
>>> user.self 0.31 0.317 0.318 0.313 0.328
>>> sys.self 0.00 0.002 0.000 0.002 0.000
>>> elapsed 0.31 0.325 0.319 0.317 0.329
>>> user.child 0.00 0.000 0.000 0.000 0.000
>>> sys.child 0.00 0.000 0.000 0.000 0.000
>>>
>>>> irl <- IRangesList(lapply(xcripts, ranges))
>>>
>>>> replicate(5, system.time(lapply(irl, length)))
>>> [,1] [,2] [,3] [,4] [,5]
>>> user.self 0.032 0.031 0.032 0.031 0.030
>>> sys.self 0.000 0.000 0.000 0.001 0.001
>>> elapsed 0.032 0.031 0.032 0.032 0.031
>>> user.child 0.000 0.000 0.000 0.000 0.000
>>> sys.child 0.000 0.000 0.000 0.000 0.000
>>
>> Awesome!
>>
>> Thanks for dumping some brain power into this.
>>
>> Out of curiosity: I have several lists of serialized GRanges objects
>> which I had to regenerate with the introduction of isCircular (or
>> whatever it was) because of binary incompatibility with old/new
>> versions of GRanges.
>>
>> Do these updates break any binary compatibility or anything? I'm not
>> complaining, I just want to make sure I avoid updating until I can get
>> "out of the woods" and find time to regenerate these things ;-).
>
> No, the speed-up did not involve changes in class structure.
Nice.
> Have you tried updateObject on your objects?
No (I didn't even know it was there *blush*).
It's not exactly clear to me how I would have done that, though. If I
remember correctly R was failing inside the load() call, so I didn't
have a chance to updateObject() anything ... does that make sense?
Imagine I had a file called "genes.rda" which consisted of one object:
a list of GRanges objects called `genes`.
I thought I was getting an error right after load("genes.rda"). Can I
suppress validity checks for a minute while a load "genes.rda", then
`genes <- lapply(genes, updateObject)`, or something?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list