[Bioc-devel] Combining Ordinary List of GRanges Optimisation

Hervé Pagès hpages at fhcrc.org
Tue Jan 8 19:44:21 CET 2013


Hi Dario,

On 01/06/2013 07:00 PM, Dario Strbenac wrote:
>> Are you asking if you can rewrite your code to work faster, or are you asking if the BioC devs need to improve the code to be faster?
>
> I was suggesting that maybe the c function for GRanges could be optimised.
>
>> Another would be manually splitting each GRanges objects into its components: seqnames, IRanges, strand, and metadata. Then concatenate these components and build one big GRanges object.
>
> This approach gives:
>
>     user  system elapsed
>   63.488  11.092  74.786

I think this is more or less what 'do.call(c, blockRanges)' would give
you if all your GRanges objects were naked i.e. if they had no meta
columns.

>
> which by using c was previously:
>
>     user  system elapsed
> 935.770  23.657 961.952

By default c() will also combine the meta columns which can be
expensive if you have a lot of them and/or if some of them are
complicated objects. You can call c() with 'ignore.mcols=TRUE'
if you don't need to propagate the meta columns. Which, in the
context of do.call(), translates to something like:

   allRanges <- do.call(c, c(blockRanges, list(ignore.mcols=TRUE)))

IMPORTANT NOTE, related to this thread on the Bioconductor list:

   https://stat.ethz.ch/pipermail/bioconductor/2012-November/049567.html

In short: if we ask the R core guys to change the implicit c() generic,
my understanding is that it won't be possible to support additional
args in "c" methods anymore, like the 'ignore.mcols' arg of the method
for GenomicRanges objects. Should take the time to discuss this before
I proceed?

Thanks,
H.

>
> Thanks for the tip. I now remember using this approach at some time in the past.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list