[Bioc-devel] Efficient Random Sampling of Positions in GRanges

Bernat Gel bgel at imppc.org
Tue Feb 16 18:19:53 CET 2016


Hi Dario,

You could use a package called regioneR for that. It has functions to 
create random regions and to randomize existing sets of regions along 
the genome and it can do it taking into account a possible set of masked 
regions.

In your case, you could create a mask for the genome that is the genome 
minus your GRanges object, so the random regions will be placed only on 
your regions.

For example, with a GRanges (A) with only two regions in chromosome 1:

library(regioneR)

A <- toGRanges(data.frame(chr=c("chr1", "chr1"), start=c(20, 100), 
end=c(30, 150)))

hg19.genome <- getGenome("hg19") #You need the hg19 BSgenome package 
installed to do this
hg19.mask <- subtractRegions(hg19.genome, A)

random.regions <- createRandomRegions(nregions=10, length.mean=1, 
length.sd=0, genome=hg19.genome, mask=hg19.mask)



Although the randomization process can be a bit slow when dealing with 
thousands of regions, it will be way faster than the approach you propose.

Hope this helps

Bernat

Bioinformatician, PhD.
Genetic Variation and Cancer
Genetic Diagnostic Unit of Hereditary Cancer (UDGCH-IMPPC)
Institut de Medicina Predictiva i Personalitzada del Càncer (IMPPC)
Campus de Can Ruti
Ctra de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Spain

Tel (+34) 93 557 28 36
bgel at imppc.org

http://www.imppc.org/en/

On 02/16/16 12:00, Dario Strbenac wrote:
> Hello,
>
> There is no convenience function to sample nucleotide positions from a GRanges object. My approach is to generate a GRanges of every chromosomal position with a width of 1, then find the overlaps with the desired ranges (admissible regions), then sample the positions that overlapped. The construction of the GRanges object containing every chromosome position is inefficient, as is finding its overlaps with another GRanges object. Could an optimised function for this task be added to GenomicRanges ?
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list