[BioC] Why is *ply-ing over a GRangesList much slower than *ply-ing over an IRangesList?
Martin Morgan
mtmorgan at fhcrc.org
Thu Oct 14 23:55:17 CEST 2010
On 08/24/2010 07:31 PM, Steve Lianoglou wrote:
> Hi,
>
> Looping using any of the *ply (lapply, sapply, seqapply, etc.) seems
> to be significantly slower when you are iterating over a GRangesList
> vs. an IRangesList:
>
> R> library(GenomicFeatures)
> R> txdb <- loadFeatures(system.file("extdata", "UCSC_knownGene_sample.sqlite",
> package="GenomicFeatures"))
> R> xcripts <- transcriptsBy(txdb, 'gene')
> R> system.time(l1 <- sapply(xcripts, length))
> user system elapsed
> 2.298 0.003 2.302
>
> irl <- IRangesList(lapply(xcripts, ranges))
> system.time(l2 <- sapply(irl, length))
> user system elapsed
> 0.047 0.001 0.049
As an update, Patrick has improved performance 10x-ish in IRanges
1.7.40, still some more to go...
> replicate(5, system.time(lapply(xcripts, length)))
[,1] [,2] [,3] [,4] [,5]
user.self 0.31 0.317 0.318 0.313 0.328
sys.self 0.00 0.002 0.000 0.002 0.000
elapsed 0.31 0.325 0.319 0.317 0.329
user.child 0.00 0.000 0.000 0.000 0.000
sys.child 0.00 0.000 0.000 0.000 0.000
> irl <- IRangesList(lapply(xcripts, ranges))
> replicate(5, system.time(lapply(irl, length)))
[,1] [,2] [,3] [,4] [,5]
user.self 0.032 0.031 0.032 0.031 0.030
sys.self 0.000 0.000 0.000 0.001 0.001
elapsed 0.032 0.031 0.032 0.032 0.031
user.child 0.000 0.000 0.000 0.000 0.000
sys.child 0.000 0.000 0.000 0.000 0.000
Martin
>
> R> identical(l1, l2)
> [1] TRUE
>
> I was curious if this is known/expected behavior and it's unavoidable, or .. ?
>
> Thanks,
> -steve
>
> R> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-08-21 r52791)
> Platform: i386-apple-darwin10.4.0/i386 (32-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] org.Hs.eg.db_2.4.1 RSQLite_0.9-2 DBI_0.2-5
> AnnotationDbi_1.11.4
> [5] Biobase_2.9.0 GenomicFeatures_1.1.11 GenomicRanges_1.1.20
> IRanges_1.7.21
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.17.6 Biostrings_2.17.29 RCurl_1.4-3 XML_3.1-1
> biomaRt_2.5.1
> [6] rtracklayer_1.9.7 tools_2.12.0
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list