[Bioc-devel] Block bootstrap for GenomicRanges

Michael Love mich@eli@@i@hlove @ending from gm@il@com
Mon Aug 13 22:39:21 CEST 2018


dear Bioc-devels,

>From a conversation on Twitter [1], I've been wondering if there has
been any work to implement block bootstrap for GRanges.

I guess that's my main question, and then the rest is details and
ideas in case it hasn't already been done.

The block bootstrap helps to create perhaps more realistic null data,
in way that deals with clumping of features. You take many random,
potentially overlapping blocks from the original data (ranges) to fill
up a new set of data (ranges). There's a 2011 paper from Peter Bickel
about this topic [2], which had some python software associated with
it.

Aside from the statistical considerations, one would want to do each
bootstrap quickly so you can make many bootstrap samples, and so
having vectorized code would be an obvious desirable thing. I was
thinking about how the Views infrastructure is kind of what one would
want, but it operates on Rle's if I remember correctly, and the
desired output here would be ranges. Below is kind of getting what I
want, as a hack I'd then convert the Rle's back to ranges and
concatenate them if I wanted 5 x 40bp = 200 bp of bootstrap data.

> ir <- IRanges(c(51,61,71,111,121,131,200),width=5) # make some clumpy data
> Views(coverage(ir), start=round(runif(5,1,150)), width=40) # e.g. 5 random 40 bp blocks
Views on a 204-length Rle subject

views:
    start end width
[1]    97 136    40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0 1 1 ...]
[2]    50  89    40 [0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0 ...]
[3]    18  57    40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 ...]
[4]     8  47    40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 ...]
[5]    97 136    40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0 1 1 ...]

Another option would be to convert to time-series data, then use an
existing package like 'boot' then convert back to ranges.

[1] https://twitter.com/mikelove/status/1027296115346092032
[2] https://arxiv.org/abs/1101.0947



More information about the Bioc-devel mailing list