[Bioc-devel] Efficient Random Sampling of Positions in GRanges
Bernat Gel
bgel at imppc.org
Tue Feb 16 18:19:53 CET 2016
Hi Dario,
You could use a package called regioneR for that. It has functions to
create random regions and to randomize existing sets of regions along
the genome and it can do it taking into account a possible set of masked
regions.
In your case, you could create a mask for the genome that is the genome
minus your GRanges object, so the random regions will be placed only on
your regions.
For example, with a GRanges (A) with only two regions in chromosome 1:
library(regioneR)
A <- toGRanges(data.frame(chr=c("chr1", "chr1"), start=c(20, 100),
end=c(30, 150)))
hg19.genome <- getGenome("hg19") #You need the hg19 BSgenome package
installed to do this
hg19.mask <- subtractRegions(hg19.genome, A)
random.regions <- createRandomRegions(nregions=10, length.mean=1,
length.sd=0, genome=hg19.genome, mask=hg19.mask)
Although the randomization process can be a bit slow when dealing with
thousands of regions, it will be way faster than the approach you propose.
Hope this helps
Bernat
Bioinformatician, PhD.
Genetic Variation and Cancer
Genetic Diagnostic Unit of Hereditary Cancer (UDGCH-IMPPC)
Institut de Medicina Predictiva i Personalitzada del Càncer (IMPPC)
Campus de Can Ruti
Ctra de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Spain
Tel (+34) 93 557 28 36
bgel at imppc.org
http://www.imppc.org/en/
On 02/16/16 12:00, Dario Strbenac wrote:
> Hello,
>
> There is no convenience function to sample nucleotide positions from a GRanges object. My approach is to generate a GRanges of every chromosomal position with a width of 1, then find the overlaps with the desired ranges (admissible regions), then sample the positions that overlapped. The construction of the GRanges object containing every chromosome position is inefficient, as is finding its overlaps with another GRanges object. Could an optimised function for this task be added to GenomicRanges ?
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list