[Bioc-sig-seq] Random Intervals in GenomicRanges

Anna Terry anna.terry at csc.mrc.ac.uk
Mon Dec 6 12:03:14 CET 2010


Hi Michael,

Thanks for this - I used runif to write the following function, which 
will return the same number of fixed width regions for each chromosome 
as a given genomic ranges object and make sure that they are non 
overlapping.

Cheers
Anna

randomGRangesFixedWidth <- function(granges, width) {
   random.gr <- GRanges()
   for(i in 1:length(seqnames(granges)@values)) {
     curr.ir <- IRanges()
     while(length(curr.ir) < seqnames(granges)@lengths[i]) {
       nToAdd <- seqnames(granges)@lengths[i] - length(curr.ir)
       temp.ir <- IRanges(start=as.integer(runif(nToAdd, min=1, 
max=seqlengths(granges)[seqnames(granges)@values[i]]-(width -1))), 
width=width)
       curr.ir <- c(curr.ir, temp.ir)
       curr.ir <- asNormalIRanges(curr.ir)
       curr.ir <- curr.ir[width(curr.ir) ==width]
       curr.ir <- IRanges(curr.ir)
     }
     random.gr <- c(random.gr, 
GRanges(seqnames=rep(seqnames(granges)@values[i], 
seqnames(granges)@lengths[i]), ranges=asNormalIRanges(curr.ir)))
   }
   seqlengths(random.gr) <- seqlengths(granges)
   return(random.gr)
}

On 02/12/10 15:23, Michael Dondrup wrote:
> Hi,
>
> I don't know about a specific function (maybe there is) so I use the following code to simulate ranges.
> I simplified it a bit so it might not directly run but shows the concept.
>
> simulate.GRanges<-  function (n, mean.length, genome.size ){
>   # assuming read lengths are poisson distributed around the mean length, you could use a different random function
>   # assuming reads are uniformly distributed wrt. start position
>    ir = IRanges(start=as.integer(runif(n, min=1, max=genome.size)), width=rpois(n, mean.length))
>    ir = restrict(ir, 1, genome.size) # cut off reads at the end, not perfect
>    strand = Rle(sample(x=c("+","-", "*"), size=n, replace=TRUE)) # make some strand information
>    gr = GRanges(ir , seqnames="chr1",
>      seqlengths=genome.size,
>      strand=strand, universe="Mygenome")
>    return(gr)
> }
>
> You just need to get an estimate of read-length distribution from somewhere then.
>
> Hope this is useful
>
> Michael
>
> On Dec 2, 2010, at 4:06 PM, Anna Terry wrote:
>
>    
>> Hi,
>>
>> Is there any function in GenomicRanges (or other package) for getting a set of random ranges based on size distribution of a sample set?
>> Similar functions are available in bedTools( http://code.google.com/p/bedtools/wiki/Usage#shuffleBed) and Random Intervals from the ENCODE tools on Galaxy.
>>
>> Many Thanks
>> Anna
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>      
>    

-- 
Anna Terry
Lymphocyte Development
MRC Clinical Sciences Centre
Imperial College Faculty of Medicine
Hammersmith Hospital
Du Cane Road
London W12 0NN

Tel:	0208 3832140
	0208 3832145
Email: anna.terry at csc.mrc.ac.uk



More information about the Bioc-sig-sequencing mailing list