[Bioc-devel] Block bootstrap for GenomicRanges
Michael Love
mich@eli@@i@hlove @ending from gm@il@com
Mon Aug 13 22:39:21 CEST 2018
dear Bioc-devels,
>From a conversation on Twitter [1], I've been wondering if there has
been any work to implement block bootstrap for GRanges.
I guess that's my main question, and then the rest is details and
ideas in case it hasn't already been done.
The block bootstrap helps to create perhaps more realistic null data,
in way that deals with clumping of features. You take many random,
potentially overlapping blocks from the original data (ranges) to fill
up a new set of data (ranges). There's a 2011 paper from Peter Bickel
about this topic [2], which had some python software associated with
it.
Aside from the statistical considerations, one would want to do each
bootstrap quickly so you can make many bootstrap samples, and so
having vectorized code would be an obvious desirable thing. I was
thinking about how the Views infrastructure is kind of what one would
want, but it operates on Rle's if I remember correctly, and the
desired output here would be ranges. Below is kind of getting what I
want, as a hack I'd then convert the Rle's back to ranges and
concatenate them if I wanted 5 x 40bp = 200 bp of bootstrap data.
> ir <- IRanges(c(51,61,71,111,121,131,200),width=5) # make some clumpy data
> Views(coverage(ir), start=round(runif(5,1,150)), width=40) # e.g. 5 random 40 bp blocks
Views on a 204-length Rle subject
views:
start end width
[1] 97 136 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0 1 1 ...]
[2] 50 89 40 [0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0 ...]
[3] 18 57 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 ...]
[4] 8 47 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 ...]
[5] 97 136 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0 1 1 ...]
Another option would be to convert to time-series data, then use an
existing package like 'boot' then convert back to ranges.
[1] https://twitter.com/mikelove/status/1027296115346092032
[2] https://arxiv.org/abs/1101.0947
More information about the Bioc-devel
mailing list