I've thought about this a bit more: coverage works, but by default is
assumes that no ranges means coverage=0. Would it be possible to have an
argument for coverage() that allows for no coverage regions to be given
some arbitrary value like NA? (1) for metadata column that can take
positive and negative values, there's no way to distinguish no coverage
versus truly 0, (2) missingness along a sequence is an important variable
too. Maybe this topic needs to be shifted to bioc-devel (or I'm missing
another piece).

Vince


On Fri, Jun 6, 2014 at 2:24 PM, Vince S. Buffalo <vsbuffalo@ucdavis.edu>
wrote:

> Terrific! I knew I might be missing a more obvious solution — I didn't
> know that coverage's weight argument could take a metadata column. Thanks
> Jeffrey and Michael!
>
> Vince
>
>
> On Fri, Jun 6, 2014 at 1:29 PM, Michael Lawrence <
> lawrence.michael@gene.com> wrote:
>
>> Note that if your statistic is per-base (like the mentioned GC content),
>> use XInteger and XIntegerViews for efficiency. But it looks like Vince has
>> gaps, in general at least.
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jun 6, 2014 at 1:11 PM, Johnston, Jeffrey <jjj@stowers.org>
>> wrote:
>>
>>> For GRanges with a metadata column, you can do:
>>>
>>> coverage(granges, weight=“variable”)
>>>
>>> This will produce an RleList where the value at each coordinate is the
>>> sum of the “variable” metadata column of all overlapping ranges. I think
>>> this will work for your use case if your ranges do not overlap.
>>>
>>> -Jeff
>>>
>>> On Jun 6, 2014, at 2:59 PM, Vince S. Buffalo <vsbuffalo@ucdavis.edu>
>>> wrote:
>>>
>>> > Hi All,
>>> >
>>> > I'm thinking there might be a clever way to do something that I'm not
>>> aware
>>> > of. The setup is that I frequently find myself using using Views and
>>> > viewMeans, viewSums, etc. to calculate summary statistics by tiles on
>>> > sequences. I have a GRanges object with 1-width ranges (but this should
>>> > apply more generally), and metadata columns have some measurement (GC
>>> > content, pairwise diversity, some quality metric, etc). I need to go
>>> from a
>>> > quantitative variable tied to specific ranges to an Rle sequence
>>> across an
>>> > entire chromosome to use Views/viewMeans (e.g. the binnedAverages
>>> example
>>> > in the How to) . Right now I approach this with something like:
>>> >
>>> > data <- Rle(NA, length=seqlengths(txdb)['chr1'])
>>> > data[start(my_rngs)] <- my_rngs$variable # simple, since my features
>>> are
>>> > are 1-width
>>> >
>>> > # or more generally:
>>> > data2 <- Rle(NA, length=seqlengths(txdb)['chr1'])
>>> > data2[ranges(my_rngs)] <- my_rngs$variable
>>> > identical(as.vector(data), as.vector(data2)) # returns TRUE
>>> (contingent on
>>> > all widths = 1)
>>> >
>>> > Then, I can convert these to Views on a set of bins/tiles created with
>>> > tileGenome, and use the viewMean, viewSums, etc. functions (removing
>>> NAs).
>>> >
>>> > So my question is — are there better methods for creating
>>> sequence-length
>>> > Rle from metadata columns? Or another way of saying this is taking some
>>> > metadata column corresponding to ranges and mapping it to coordinate
>>> space
>>> > (maybe in one call)? It seems like if seqlengths is set in the GRanges
>>> > object, there's sufficient information to go directly from a GRanges
>>> > metadata column to an Rle vector (and I might be missing a more obvious
>>> > solution). My example assumed a single chromosome, but an approach that
>>> > knows to handle multiple sequences through RleLists seems like it
>>> would be
>>> > helpful.
>>> >
>>> > thanks,
>>> > Vince
>>> >
>>> > PS: My apologies if you've received this message twice, I had to resend
>>> > after it appears that I sent it to the wrong list.
>>> >
>>> > --
>>> > Vince Buffalo
>>> > Ross-Ibarra Lab (www.rilab.org)
>>> > Plant Sciences, UC Davis
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor@r-project.org
>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
>
> --
> Vince Buffalo
> Ross-Ibarra Lab (www.rilab.org)
> Plant Sciences, UC Davis
>



-- 
Vince Buffalo
Ross-Ibarra Lab (www.rilab.org)
Plant Sciences, UC Davis

	[[alternative HTML version deleted]]

