[Bioc-devel] as.character method for GenomicRanges?
Peter Haverty
haverty.peter at gene.com
Fri Apr 24 20:26:44 CEST 2015
Going the other way can look like this:
##' Parse one or more location strings and return as a GRanges
##'
##' Parse one or more location strings and return as a GRanges. GRanges
will get the names from the location.strings.
##' @param location.string character
##' @export
##' @return GRanges
##' @family location strings
locstring2GRanges <- function(location.string) {
##### Take a location string, "chr11:123-127" or "11:123..456 +" and
return a list with chr, start, end elements
location.string = sub("\\s+","",location.string)
location.string = sub(",","",location.string)
#location.string = sub("\\.\\.","-",location.string) # TWU style
location strings
if (any(! grepl("^(chr){0,1}.+:\\d+-\\d+$", location.string))) {
stop("Some location strings do not look like chr1:123-456.") }
start = as.integer(sub("^.+:(\\d+)-.+$", "\\1", location.string))
stop = as.integer(sub("^.+-(\\d+)", "\\1", location.string))
gr = GRanges( IRanges(
start=pmin(start, stop),
end=pmax(start, stop),
names=names(location.string))
, seqnames=sub("^chr{0,1}(.*):.*$", "\\1", location.string) )
return(gr)
}
Surprisingly the repeated subs are faster than splitting. Some people,
such as GSNAP author Tom Wu, use the format "chr1:1234..1235", which we
might want to support. The pmin/pmax stuff handles cases where the negative
strand is expressed by flipping start and stop. We might not need that.
Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty <phaverty at gene.com> wrote:
> Good catch. We'll want the strand in case we need to go back to a GRanges.
> I would make the strand addition optional with the default of FALSE. It's
> nice to have a column of strings you can paste right into a genome browser
> (sorry Michael :-) ). I often pass my bench collaborators a spreadsheet
> with such a column.
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phaverty at gene.com
>
> On Fri, Apr 24, 2015 at 10:50 AM, Hervé Pagès <hpages at fredhutch.org>
> wrote:
>
>> On 04/24/2015 10:21 AM, Michael Lawrence wrote:
>>
>>> Sorry, one more concern, if you're thinking of using as a range key, you
>>> will need the strand, but many use cases might not want the strand on
>>> there. Like for pasting into a genome browser.
>>>
>>
>> What about appending the strand only for GRanges objects that
>> have at least one range that is not on *?
>>
>> setMethod("as.character", "GenomicRanges",
>> function(x)
>> {
>> if (length(x) == 0L)
>> return(character(0))
>> ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
>> if (any(strand(x) != "*"))
>> ans <- paste0(ans, ":", strand(x))
>> ans
>> }
>> )
>>
>> > as.character(gr)
>> [1] "chr1:1-10" "chr2:2-10" "chr2:3-10" "chr2:4-10" "chr1:5-10"
>> [6] "chr1:6-10" "chr3:7-10" "chr3:8-10" "chr3:9-10" "chr3:10-10"
>>
>> > strand(gr)[2:3] <- c("-", "+")
>> > as.character(gr)
>> [1] "chr1:1-10:*" "chr2:2-10:-" "chr2:3-10:+" "chr2:4-10:*"
>> "chr1:5-10:*"
>> [6] "chr1:6-10:*" "chr3:7-10:*" "chr3:8-10:*" "chr3:9-10:*"
>> "chr3:10-10:*"
>>
>> H.
>>
>>
>>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence <michafla at gene.com
>>> <mailto:michafla at gene.com>> wrote:
>>>
>>> It is a great idea, but I'm not sure I would use it to implement
>>> table(). Allocating those strings will be costly. Don't we already
>>> have the 4-way int hash? Of course, my intuition might be completely
>>> off here.
>>>
>>>
>>> On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès <hpages at fredhutch.org
>>> <mailto:hpages at fredhutch.org>> wrote:
>>>
>>> Hi Pete,
>>>
>>> Excellent idea. That will make things like table() work
>>> out-of-the-box
>>> on GenomicRanges objects. I'll add that.
>>>
>>> Thanks,
>>> H.
>>>
>>>
>>>
>>> On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>>
>>> Would people be interested in having this:
>>>
>>> setMethod("as.character", "GenomicRanges",
>>> function(x) {
>>> paste0(seqnames(x), ":", start(x), "-",
>>> end(x))
>>> })
>>>
>>> ?
>>>
>>> I find myself doing that a lot to make unique names or for
>>> output that
>>> goes to collaborators. I suppose we might want to tack on
>>> the strand if it
>>> isn't "*". I have some code for going the other direction
>>> too, if there is
>>> interest.
>>>
>>>
>>>
>>> Pete
>>>
>>> ____________________
>>> Peter M. Haverty, Ph.D.
>>> Genentech, Inc.
>>> phaverty at gene.com <mailto:phaverty at gene.com>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list