[BioC] import.bw to Rle?

Janet Young jayoung at fhcrc.org
Mon Aug 15 22:09:21 CEST 2011


thanks very much, Michael - that'll do it. 

Janet


On Aug 14, 2011, at 6:09 PM, Michael Lawrence wrote:

> 
> 
> On Fri, Aug 12, 2011 at 6:13 PM, Janet Young <jayoung at fhcrc.org> wrote:
> Hi,
> 
> Is there an easy way to read in a bigWig file as Rle values appropriately placed along the chromosomes?   Specifically, I'm using rtracklayer's import.bw to read in a mapability track I downloaded from UCSC and I get a RangedData object, as expected. Mapability scores are calculated as per base-pair basis, so it seems obvious to me to covert the scores to Rle.
> 
> ###
> library(rtracklayer)
> 
> ## built-in example
> test.bw.file <- system.file("tests", "test.bw", package = "rtracklayer")
> bw <- import(test.bw.file, ranges = GenomicRanges::GRanges("chr19", IRanges(1, 6e7)))
> bw
>  RangedData with 9 rows and 1 value column across 1 space
>     space               ranges |     score
>  <factor>            <IRanges> | <numeric>
> 1    chr19 [59302001, 59302300] |     -1.00
> 2    chr19 [59302301, 59302600] |     -0.75
> 3    chr19 [59302601, 59302900] |     -0.50
> 4    chr19 [59302901, 59303200] |     -0.25
> 5    chr19 [59303201, 59303500] |      0.00
> 6    chr19 [59303501, 59303800] |      0.25
> 7    chr19 [59303801, 59304100] |      0.50
> 8    chr19 [59304101, 59304400] |      0.75
> 9    chr19 [59304401, 59304700] |      1.00
> 
> ##my example using a whole-genome mapability file I got from
> # http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMapability/
> 
> ### for now just look at one small region, but eventually I'll read in data for the whole genome
> testRegion <- GRangesForUCSCGenome("hg18", "chr12", IRanges(57795963, 57797963))
> mapabilityTrack_small <-  import.bw("/home/jayoung/traskdata/public_databases/UCSC/Mar2006/other_tracks/wgEncodeCrgMapabilityAlign40mer.bw", selection = BigWigSelection(testRegion) )
> 
> mapabilityTrack_small
> RangedData with 92 rows and 1 value column across 49 spaces
>       space               ranges   |      score
>    <factor>            <IRanges>   |  <numeric>
> 1      chr12 [57795963, 57796961]   |  1.0000000
> 2      chr12 [57796962, 57796963]   |  0.5000000
> 3      chr12 [57796964, 57796969]   |  1.0000000
> 4      chr12 [57796970, 57796971]   |  0.5000000
> 5      chr12 [57796972, 57796984]   |  1.0000000
> ### etc
> 
> head(width(mapabilityTrack_small),20)
> #  [1] 999   2   6   2  13   1   1   1  19   3   1   1   1   1   2   1   1   1   1
> #  [20]   4
> 
> ####
> 
> A RangedData object with scores makes sense for some use cases but for many others it seems like RleList would make more sense (thinking about the types of data that will be encoded in bigWig files, like mapability, where you get a per-base pair score).
> 
> I think I will be able to figure out a way to convert the RangedData scores - it'll be something like this, but maybe also padding with NA at the end to go the full length of each chromosome, and applying the function over all chromosomes.
> 
> Rle( values=c(NA,bw$score), lengths=c( (start(bw)[1]-1), width(bw) ) )
> # 'numeric' Rle of length 59304700 with 10 runs
> #   Lengths: 59302000      300      300      300      300      300      300      300      300      300
> #   Values :       NA       -1    -0.75     -0.5    -0.25        0     0.25      0.5     0.75        1
> 
> Then I'll be able to use seqselect on the Rle to get scores in regions I'm interested in, average the scores per region, average scores at each position in equivalent regions, etc, etc.
> 
> But I did wonder whether I'm missing a more obvious solution to convert to Rle - is there already a function that'll do this for me?  If not, could I suggest that as an enhancement for import.bw?  (am I just being lazy?)
> 
> 
> 
> The unobvious solution is to call coverage(rd, weight = "score"). 
> 
> Michael
> 
>  
> thanks,
> 
> Janet
> 
> -------------------------------------------------------------------
> 
> Dr. Janet Young
> 
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., C3-168,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
> 
> tel: (206) 667 1471 fax: (206) 667 6524
> email: jayoung  ...at...  fhcrc.org
> 
> 
> -------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list