[BioC] phastCon-scores

Sean Davis seandavi at gmail.com
Wed Dec 16 02:25:39 CET 2009


On Tue, Dec 15, 2009 at 7:56 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
>
>
> On Tue, Dec 15, 2009 at 2:21 PM, Sean Davis <seandavi at gmail.com> wrote:
>>
>> On Tue, Dec 15, 2009 at 5:01 PM, Johannes Waage <johannes.waage at bric.dk>
>> wrote:
>> > Hi all,
>> >
>> > I have a small but important challenge set before me, that I've been
>> > unable
>> > to solve. I need to aggregate all phastCon scores for 75-100 nt around
>> > all *
>> > mus* exon splicesites. I've tried different approaches, such as
>> > downloading
>> > the entire mulitz30way phastCon dataset from UCSC (too big to work with
>> > smoothly), download using intersect with UCSC table browser and Galaxy
>> > (limits me to 10 million data points, unfortunately), and fetching data
>> > trough rtracklayer (too slow). Can anyone point me towards an elegant
>> > and
>> > fast way to fetch datapoints for many genomic intervals? With around 22k
>> > genes, with an average exon count of 8 times 100 nt, it seems I need to
>> > be
>> > able to fetch around 20m data points.
>> >
>> > I need to use the data as background in comparison to select upregulated
>> > exons in a RNA-seq splice study.
>>
>> Could you do this chromosome-by-chromosome by loading the per-base
>> data one chromosome at a time from the files into an R vector and then
>> using normal vector subsetting to get the regions of interest?
>>
>> Alternatively, with a little work, you could probably also build a
>> little index file and then use random access to get the data from the
>> files.
>>
>> Finally, there are probably some tools in the UCSC browser tool chain
>> that you could download to deal with conservation data fairly quickly.
>>
>
> This may be a decent use case for bigWig support in Bioconductor. The data
> is stored in a binary, indexed form, so it should be easy and efficient to
> bring subsets into memory/R.
>
> The mappability tracks are another example. Looks like rtracklayer may be
> the place for this, at least initially.  The mythical common IO package
> would be helpful though.

I agree that bigWig support would be a useful addition to the
bioconductor tool set.

Sean



More information about the Bioconductor mailing list