[Bioc-devel] BigWigViews

Cook, Malcolm MEC at stowers.org
Tue Nov 19 20:57:55 CET 2013


I just went through this approach in yeast, 
	regions = gene promoters	
	assays =  H3K3ME1, H3K3ME2, H3K3ME3 ChipSeq
	experimental conditions: 7 recombinant knock-outs and knock-ins of different domains of different genes.
	two replicates

So, what I first reached for was something like the IRanges views pattern, but for a collection of bigWig, such as we are discussing.

And, not finding it in the wild, I rolled by own using existing BioC, allowing me to curry a function down to each Rle corresponding to each region in each bigWig, returning ultimately a matrix of values.

This worked, but, in retrospect, I think my need would have been better served by serializing the results  for each bigwig, saving it in a corresponding file. 


Because I find myself now in the position of now needing to slice and dice different subsets of samples and assays, and also add in new samples from more recent experiments.  

The one thing that has NOT change is the list of regions (in my case, gene promoters).

If I instead adopt an approach where I create one serialized (.RDS or .csv) file per bigwig storing the tabulation of my summary function at the promoter level, I can easily load just the ones in any combination I need for a latter analysis.  I can even load them into a multi-dimensional array, index by, say [sample,assay].

What I'm thinking wanting now rather is some interface to multi-dimensional (virtual) array, where two of the dimensions (say, assay and sample) determine the filepath containing the third dimension, begin the computed and serialized value at each promoter.

Your mileage may vary.

~ Malcolm

 >Retrieving the data for a genomic range is efficient, doing this for
 >thousands of samples might get tricky, but could probably be vectorized
 >through clever use of matrices. But millions of regions by thousands of
 >samples might need some support in native code, along the lines of
 >viewSums, etc, but iterating over the bigwigs directly. Maybe you guys
 >couple implement something in R and then we could profile and optimize it.
 >On Mon, Nov 18, 2013 at 4:33 PM, Kasper Daniel Hansen <
 >kasperdanielhansen at gmail.com> wrote:
 >> (Michael Love and I had some discussion on this Friday)
 >> I also think it would be a very convenient class/method.  A lot of data
 >> these days are naturally represented (and are available from say GEO) as
 >> bigWig files (essentially coverage tracks), for example ChIP-seq.  This
 >> would be much more efficient than converting BAM to coverage on the fly.
 >> It seems to me that bigWig ought to be efficient for this, but I am not
 >> very familiar with its performance.  What we want is really to be able to
 >> chunk multiple coverage profiles over the genome, and do computations on
 >> each of the chunks.  Any idea on efficiency?  I am happy to contribute a
 >> bit, at least with design.
 >> Best,
 >> Kasper
 >> On Mon, Nov 18, 2013 at 6:11 PM, Michael Lawrence <
 >> lawrence.michael at gene.com> wrote:
 >>> Aggregating coverage over multiple samples is a popular request recently.
 >>> I'm happy to support this effort, but I thinks someone in Seattle is going
 >>> to have to take the lead on it.
 >>> On Mon, Nov 18, 2013 at 2:36 PM, Michael Love
 >>> <michaelisaiahlove at gmail.com>wrote:
 >>> > a discussion came up on devel last year about looking at a genomic range
 >>> > over multiple samples and multiple experiments (
 >>> >
 >>> >
 >>> https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120920/93a4fb61/attachment.pl
 >>> >  )
 >>> >
 >>> > stepping aside the multiple experiment part, I'm interested in
 >>> > BigWigViews() with fixed ranges across samples. Has there been any more
 >>> > thoughts in this direction?
 >>> >
 >>> > BigWigViews would be incredibly useful for genomics applications where
 >>> we
 >>> > want to scan along the genome looking at lots of samples. BigWig offers
 >>> a
 >>> > concise representation of the information compared to BAM files.
 >>> >
 >>> > What I am trying now is using import(BigWigFile, which=gr) on files one
 >>> by
 >>> > one, and then binding the coverage together.
 >>> >
 >>> > best,
 >>> >
 >>> > Mike
 >>> >
 >>> >         [[alternative HTML version deleted]]
 >>> >
 >>> > _______________________________________________
 >>> > Bioc-devel at r-project.org mailing list
 >>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
 >>> >
 >>>         [[alternative HTML version deleted]]
 >>> _______________________________________________
 >>> Bioc-devel at r-project.org mailing list
 >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
 >	[[alternative HTML version deleted]]
 >Bioc-devel at r-project.org mailing list

More information about the Bioc-devel mailing list