[BioC] Why is *ply-ing over a GRangesList much slower than *ply-ing over an IRangesList?

Martin Morgan mtmorgan at fhcrc.org
Thu Oct 14 23:55:17 CEST 2010

On 08/24/2010 07:31 PM, Steve Lianoglou wrote:
> Hi,
> Looping using any of the *ply (lapply, sapply, seqapply, etc.) seems
> to be significantly slower when you are iterating over a GRangesList
> vs. an IRangesList:
> R> library(GenomicFeatures)
> R> txdb <- loadFeatures(system.file("extdata", "UCSC_knownGene_sample.sqlite",
>       package="GenomicFeatures"))
> R> xcripts <- transcriptsBy(txdb, 'gene')
> R> system.time(l1 <- sapply(xcripts, length))
>    user  system elapsed
>   2.298   0.003   2.302
> irl <- IRangesList(lapply(xcripts, ranges))
> system.time(l2 <- sapply(irl, length))
>    user  system elapsed
>   0.047   0.001   0.049

As an update, Patrick has improved performance 10x-ish in IRanges
1.7.40, still some more to go...

> replicate(5, system.time(lapply(xcripts, length)))
           [,1]  [,2]  [,3]  [,4]  [,5]
user.self  0.31 0.317 0.318 0.313 0.328
sys.self   0.00 0.002 0.000 0.002 0.000
elapsed    0.31 0.325 0.319 0.317 0.329
user.child 0.00 0.000 0.000 0.000 0.000
sys.child  0.00 0.000 0.000 0.000 0.000

> irl <- IRangesList(lapply(xcripts, ranges))

> replicate(5, system.time(lapply(irl, length)))
            [,1]  [,2]  [,3]  [,4]  [,5]
user.self  0.032 0.031 0.032 0.031 0.030
sys.self   0.000 0.000 0.000 0.001 0.001
elapsed    0.032 0.031 0.032 0.032 0.031
user.child 0.000 0.000 0.000 0.000 0.000
sys.child  0.000 0.000 0.000 0.000 0.000


> R> identical(l1, l2)
> [1] TRUE
> I was curious if this is known/expected behavior and it's unavoidable, or .. ?
> Thanks,
> -steve
> R> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-08-21 r52791)
> Platform: i386-apple-darwin10.4.0/i386 (32-bit)
> locale:
> [1] C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] org.Hs.eg.db_2.4.1     RSQLite_0.9-2          DBI_0.2-5
>   AnnotationDbi_1.11.4
> [5] Biobase_2.9.0          GenomicFeatures_1.1.11 GenomicRanges_1.1.20
>   IRanges_1.7.21
> loaded via a namespace (and not attached):
> [1] BSgenome_1.17.6    Biostrings_2.17.29 RCurl_1.4-3        XML_3.1-1
>          biomaRt_2.5.1
> [6] rtracklayer_1.9.7  tools_2.12.0

Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

More information about the Bioconductor mailing list