[Bioc-devel] a pattern to be avoided? mcols(x)$y <- z

Pages, Herve hp@ge@ @ending from fredhutch@org
Wed Nov 7 01:44:10 CET 2018


Hi Vince,

It looks like Michael took care of this in devel (thanks Michael):

   https://github.com/Bioconductor/GenomicRanges/issues/11

H.


On 10/3/18 08:20, Hervé Pagès wrote:
> Hi Vince,
>
> This issue was reported here a couple of weeks ago:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomicRanges_issues_11&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=rQYCrACByPJfpkobQLfW_4tycLFlqOKZhV11BY0jS-Y&s=ZigXp_UGHmp6bEdO6oHZZYWDLD7hgLvoKXgtJ_1pZHA&e=
>
> Internally $<- uses something like:
>
>   do.call(DataFrame, list(DF1, DF2))
>
> to combine the metadata columns. However in some situations
> the do.call(DataFrame, list(...)) form is **very** inefficient
> compared to the more direct DataFrame(...) form:
>
>   library(S4Vectors)
>   DF1 <- DataFrame(a=Rle(11:1999, 1011:2999), b=5)
>   DF2 <- DataFrame(c=Rle(12:2000, 1011:2999))
>   system.time(DF12 <- do.call(DataFrame, list(DF1, DF2)))
>   #   user  system elapsed
>   #  4.476   0.000   4.476
>   system.time(DF12b <- DataFrame(DF1, DF2))
>   #   user  system elapsed
>   #  0.002   0.000   0.001
>   identical(DF12, DF12b)
>   # [1] TRUE
>
> @Michael: Any idea what's going on?
>
> Thanks,
> H.
>
>
> On 10/03/2018 07:01 AM, Vincent Carey wrote:
>> The following comes up in use of 
>> Fdb.InfiniumMethylation.hg19::getPlatform
>>
>>
>> debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))
>>
>> Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))
>>
>>     user  system elapsed
>>
>>    0.020   0.003   0.022
>>
>> Browse[3]> system.time(mcols(GR)$channel <-
>> Rle(as.factor(mcols(GR)$channel450)))
>>
>>     user  system elapsed
>>
>>   47.263   0.067  47.373
>>
>> Browse[3]> GR$channel[1]
>>
>> factor-Rle of length 1 with 1 run
>>
>>    Lengths:    1
>>
>>    Values : Both
>>
>> Levels(3): Both Grn Red
>>
>> Browse[3]> system.time(GR$channel <- 
>> Rle(as.factor(mcols(GR)$channel450)))
>>
>>     user  system elapsed
>>
>>    0.058   0.006   0.065
>>
>>
>> Presumably the mcols()$<- copies/rewrites a lot of data needlessly?
>>
>
-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list