[BioC] import.bw to Rle?

Janet Young jayoung at fhcrc.org
Sat Aug 13 03:13:02 CEST 2011


Hi,

Is there an easy way to read in a bigWig file as Rle values appropriately placed along the chromosomes?   Specifically, I'm using rtracklayer's import.bw to read in a mapability track I downloaded from UCSC and I get a RangedData object, as expected. Mapability scores are calculated as per base-pair basis, so it seems obvious to me to covert the scores to Rle.

###
library(rtracklayer)

## built-in example
test.bw.file <- system.file("tests", "test.bw", package = "rtracklayer")
bw <- import(test.bw.file, ranges = GenomicRanges::GRanges("chr19", IRanges(1, 6e7)))
bw
 RangedData with 9 rows and 1 value column across 1 space
     space               ranges |     score
  <factor>            <IRanges> | <numeric>
1    chr19 [59302001, 59302300] |     -1.00
2    chr19 [59302301, 59302600] |     -0.75
3    chr19 [59302601, 59302900] |     -0.50
4    chr19 [59302901, 59303200] |     -0.25
5    chr19 [59303201, 59303500] |      0.00
6    chr19 [59303501, 59303800] |      0.25
7    chr19 [59303801, 59304100] |      0.50
8    chr19 [59304101, 59304400] |      0.75
9    chr19 [59304401, 59304700] |      1.00

##my example using a whole-genome mapability file I got from 
# http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMapability/

### for now just look at one small region, but eventually I'll read in data for the whole genome
testRegion <- GRangesForUCSCGenome("hg18", "chr12", IRanges(57795963, 57797963))
mapabilityTrack_small <-  import.bw("/home/jayoung/traskdata/public_databases/UCSC/Mar2006/other_tracks/wgEncodeCrgMapabilityAlign40mer.bw", selection = BigWigSelection(testRegion) )

mapabilityTrack_small
RangedData with 92 rows and 1 value column across 49 spaces
       space               ranges   |      score
    <factor>            <IRanges>   |  <numeric>
1      chr12 [57795963, 57796961]   |  1.0000000
2      chr12 [57796962, 57796963]   |  0.5000000
3      chr12 [57796964, 57796969]   |  1.0000000
4      chr12 [57796970, 57796971]   |  0.5000000
5      chr12 [57796972, 57796984]   |  1.0000000
### etc

head(width(mapabilityTrack_small),20)
#  [1] 999   2   6   2  13   1   1   1  19   3   1   1   1   1   2   1   1   1   1
#  [20]   4

####

A RangedData object with scores makes sense for some use cases but for many others it seems like RleList would make more sense (thinking about the types of data that will be encoded in bigWig files, like mapability, where you get a per-base pair score).   

I think I will be able to figure out a way to convert the RangedData scores - it'll be something like this, but maybe also padding with NA at the end to go the full length of each chromosome, and applying the function over all chromosomes. 

Rle( values=c(NA,bw$score), lengths=c( (start(bw)[1]-1), width(bw) ) ) 
# 'numeric' Rle of length 59304700 with 10 runs
#   Lengths: 59302000      300      300      300      300      300      300      300      300      300
#   Values :       NA       -1    -0.75     -0.5    -0.25        0     0.25      0.5     0.75        1

Then I'll be able to use seqselect on the Rle to get scores in regions I'm interested in, average the scores per region, average scores at each position in equivalent regions, etc, etc.

But I did wonder whether I'm missing a more obvious solution to convert to Rle - is there already a function that'll do this for me?  If not, could I suggest that as an enhancement for import.bw?  (am I just being lazy?)

thanks, 

Janet

------------------------------------------------------------------- 

Dr. Janet Young 

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org


------------------------------------------------------------------- 



More information about the Bioconductor mailing list