[BioC] Faster way to apply a function than endoapply on 40K GrangesList

Hervé Pagès hpages at fhcrc.org
Thu Aug 11 08:36:27 CEST 2011


Hi,

Note that the "flank" method for GRanges objects is strand aware and
in this context "end" means "3' end". If that's not what you want,
just do something like

   start(grl) <- end(grl) + 1L
   end(grl) <- end(grl) + 10000L

and you are done.

But if you really want the "3' end", I can't think of an easy/efficient
way, unless you do something *highly* NOT recommended like accessing
directly a slot of your GRangesList object:

<NEVER DO THIS!>

   grl at unlistData <- flank(grl at unlistData, 10000, start=FALSE)

</NEVER DO THIS!>

Of course we should provide a "flank" method for GRangesList
objects that will do the above so the user doesn't need to use
a dangerous/forbidden trick ;-)

H.


On 11-08-10 02:48 PM, Martin Morgan wrote:
> On 08/10/2011 12:36 PM, Lakshmanan Iyer wrote:
>> Hi,
>> I am trying to add 10K to the end of the intervals in a GRangesList
>> object using endoapply and flank using the comand:
>> p3UTRs10K<- endoapply(p3UTRs,function (z) flank (z, 10000,start=FALSE));
>
> maybe along the lines of
>
> example(GRangesList)
> ranges(grl) = flank(ranges(grl), 10000, start=FALSE)
>
> which gives
>
>  > end(grl)
> CompressedIntegerList of length 3
> [["gr1"]] 6
> [["gr2"]] 9 15
> [["gr3"]] 3 9
>  > ranges(grl) = flank(ranges(grl), 1000, start=FALSE)
>  > start(grl)
> CompressedIntegerList of length 3
> [["gr1"]] 7
> [["gr2"]] 10 16
> [["gr3"]] 4 10
>  > end(grl)
> CompressedIntegerList of length 3
> [["gr1"]] 1006
> [["gr2"]] 1009 1015
> [["gr3"]] 1003 1009
>
> ? Martin
>
>>
>> I am writing to find out if someone has a suggestion for a faster way
>> of doing this? With 40000 elements it is taking a long time..
>>
>> Here is what I am doing:
>>
>> class (p3UTRs)
>> [1] "GRangesList"
>> attr(,"package")
>> [1] "GenomicRanges"
>>
>> length (p3UTRs)
>> [1] 41053
>>
>> p3UTRs10K<- endoapply(p3UTRs,function (z) flank (z, 10000,start=FALSE));
>>
>>> sessionInfo()
>> R version 2.13.0 (2011-04-13)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.17 GenomicFeatures_1.4.0
>> [3] leeBamViews_0.99.11 BSgenome_1.20.0
>> [5] Biobase_2.12.1 Rsamtools_1.4.1
>> [7] Biostrings_2.20.0 GenomicRanges_1.4.3
>> [9] IRanges_1.10.0 rtracklayer_1.12.0
>> [11] RCurl_1.5-0 bitops_1.0-4.1
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.8.0 DBI_0.2-5 RSQLite_0.9-4 tools_2.13.0 XML_3.2-0
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list