[Bioc-devel] Efficient Random Sampling of Positions in GRanges

Hervé Pagès hpages at fredhutch.org
Tue Feb 16 19:45:48 CET 2016


Hi Dario,

On 02/16/2016 03:00 AM, Dario Strbenac wrote:
> Hello,
>
> There is no convenience function to sample nucleotide positions from a GRanges object. My approach is to generate a GRanges of every chromosomal position with a width of 1, then find the overlaps with the desired ranges (admissible regions), then sample the positions that overlapped.

You can replace these 3 steps with the following 1-liner by using a
GPos object (new in GenomicRanges devel):

   sample(GPos(admissible_regions), ...)

It should also be more efficient.

 From the GPos man page:

      The GPos class is a container for storing a set of genomic
      _positions_, that is, genomic ranges of width 1. Even though a
      GRanges object can be used for that, using a GPos object can be
      much more memory-efficient, especially when the object contains
      long runs of adjacent positions.

So you might improve efficiency a little bit by making sure that the
sampling preserves the order of the genomic positions e.g. with
something like this:

   gpos <- GPos(admissible_regions)
   ## Extract 1000 random genomic positions:
   gpos[order(sample(length(gpos), 1000))]

H.

> The construction of the GRanges object containing every chromosome position is inefficient, as is finding its overlaps with another GRanges object. Could an optimised function for this task be added to GenomicRanges ?
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list