[BioC] working with genome-wide phastCons scores

Steve Lianoglou lianoglou.steve at gene.com
Wed Oct 23 18:55:21 CEST 2013


Hi Robert,

On Wed, Oct 23, 2013 at 9:03 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
> dear list,
>
> i have to pretty intensively work with genome-wide phastcons scores and
> instead of repeatedly interrogate them through the internet via the UCSC
> genome browser with 'rtracklayer', i'd prefer to do a bulk download of the
> *.phastCons46way.wigFix.gz files (about 0.6Gb) at
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vertebrate
>
> and then import them into R storing the information in some memory efficient
> data structure (Rle?) that provides me also an efficient way to query the
> phastcons score at any position of the human genome.
>
> all documentation and messages i've been able to retrieve through google and
> the BioC list correspond to use cases that involve a small fraction of the
> genome which can be handled by 'rtracklayer' with an internet connection.
>
> any hint on how to achieve this goal will be very much appreciated,

I previously did this by downloading the wig files and converting them
to bigWig format. To do so, I used the wigToBigWig tool you can
eventually get to form here:

http://genomewiki.ucsc.edu/index.php/Kent_source_utilities

It is unfortunate that UCSC doesn't host the bigWig files, as
conversion from wig to bigWig is hugely memory intensive (last I
remember, anyway).

(Update: not sure when this happened, but rtracklayer actually wraps
this functionality in its wigToBigWig function -- nice! -- so you can
convert the wig file to bigWig straight from R (assuming you have a
machine with enough horsepower))

After that, you could then use rtracklayer's import functions over the
bigWig files to query them using a GRanges object. Look at the
Examples section in the ?import.bw man page for some help.

HTH,
-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list