[Bioc-devel] a pattern to be avoided? mcols(x)$y <- z
Hervé Pagès
hp@ge@ @ending from fredhutch@org
Wed Oct 3 17:20:46 CEST 2018
Hi Vince,
This issue was reported here a couple of weeks ago:
https://github.com/Bioconductor/GenomicRanges/issues/11
Internally $<- uses something like:
do.call(DataFrame, list(DF1, DF2))
to combine the metadata columns. However in some situations
the do.call(DataFrame, list(...)) form is **very** inefficient
compared to the more direct DataFrame(...) form:
library(S4Vectors)
DF1 <- DataFrame(a=Rle(11:1999, 1011:2999), b=5)
DF2 <- DataFrame(c=Rle(12:2000, 1011:2999))
system.time(DF12 <- do.call(DataFrame, list(DF1, DF2)))
# user system elapsed
# 4.476 0.000 4.476
system.time(DF12b <- DataFrame(DF1, DF2))
# user system elapsed
# 0.002 0.000 0.001
identical(DF12, DF12b)
# [1] TRUE
@Michael: Any idea what's going on?
Thanks,
H.
On 10/03/2018 07:01 AM, Vincent Carey wrote:
> The following comes up in use of Fdb.InfiniumMethylation.hg19::getPlatform
>
>
> debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))
>
> Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))
>
> user system elapsed
>
> 0.020 0.003 0.022
>
> Browse[3]> system.time(mcols(GR)$channel <-
> Rle(as.factor(mcols(GR)$channel450)))
>
> user system elapsed
>
> 47.263 0.067 47.373
>
> Browse[3]> GR$channel[1]
>
> factor-Rle of length 1 with 1 run
>
> Lengths: 1
>
> Values : Both
>
> Levels(3): Both Grn Red
>
> Browse[3]> system.time(GR$channel <- Rle(as.factor(mcols(GR)$channel450)))
>
> user system elapsed
>
> 0.058 0.006 0.065
>
>
> Presumably the mcols()$<- copies/rewrites a lot of data needlessly?
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list