[Bioc-devel] as.character method for GenomicRanges?

Peter Haverty haverty.peter at gene.com
Fri Apr 24 20:26:44 CEST 2015


Going the other way can look like this:

##' Parse one or more location strings and return as a GRanges



##'



##' Parse one or more location strings and return as a GRanges. GRanges
will get the names from the location.strings.


##' @param location.string character



##' @export



##' @return GRanges



##' @family location strings



locstring2GRanges <- function(location.string) {



  #####  Take a location string, "chr11:123-127" or "11:123..456 +" and
return a list with chr, start, end elements


  location.string = sub("\\s+","",location.string)
  location.string = sub(",","",location.string)
  #location.string = sub("\\.\\.","-",location.string)  # TWU style
location strings


  if (any(! grepl("^(chr){0,1}.+:\\d+-\\d+$", location.string))) {
stop("Some location strings do not look like chr1:123-456.") }
  start = as.integer(sub("^.+:(\\d+)-.+$", "\\1", location.string))
  stop = as.integer(sub("^.+-(\\d+)", "\\1", location.string))
  gr = GRanges( IRanges(
    start=pmin(start, stop),
    end=pmax(start, stop),
    names=names(location.string))
    , seqnames=sub("^chr{0,1}(.*):.*$", "\\1", location.string) )
  return(gr)
}

Surprisingly the repeated subs are faster than splitting.  Some people,
such as GSNAP author Tom Wu, use the format "chr1:1234..1235", which we
might want to support. The pmin/pmax stuff handles cases where the negative
strand is expressed by flipping start and stop. We might not need that.



Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Fri, Apr 24, 2015 at 11:08 AM, Peter Haverty <phaverty at gene.com> wrote:

> Good catch. We'll want the strand in case we need to go back to a GRanges.
> I would make the strand addition optional with the default of FALSE. It's
> nice to have a column of strings you can paste right into a genome browser
> (sorry Michael :-) ).  I often pass my bench collaborators a spreadsheet
> with such a column.
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phaverty at gene.com
>
> On Fri, Apr 24, 2015 at 10:50 AM, Hervé Pagès <hpages at fredhutch.org>
> wrote:
>
>> On 04/24/2015 10:21 AM, Michael Lawrence wrote:
>>
>>> Sorry, one more concern, if you're thinking of using as a range key, you
>>> will need the strand, but many use cases might not want the strand on
>>> there. Like for pasting into a genome browser.
>>>
>>
>> What about appending the strand only for GRanges objects that
>> have at least one range that is not on *?
>>
>> setMethod("as.character", "GenomicRanges",
>>     function(x)
>>     {
>>         if (length(x) == 0L)
>>             return(character(0))
>>         ans <- paste0(seqnames(x), ":", start(x), "-", end(x))
>>         if (any(strand(x) != "*"))
>>               ans <- paste0(ans, ":", strand(x))
>>         ans
>>     }
>> )
>>
>> > as.character(gr)
>>  [1] "chr1:1-10"  "chr2:2-10"  "chr2:3-10"  "chr2:4-10"  "chr1:5-10"
>>  [6] "chr1:6-10"  "chr3:7-10"  "chr3:8-10"  "chr3:9-10"  "chr3:10-10"
>>
>> > strand(gr)[2:3] <- c("-", "+")
>> > as.character(gr)
>>  [1] "chr1:1-10:*"  "chr2:2-10:-"  "chr2:3-10:+"  "chr2:4-10:*"
>> "chr1:5-10:*"
>>  [6] "chr1:6-10:*"  "chr3:7-10:*"  "chr3:8-10:*"  "chr3:9-10:*"
>> "chr3:10-10:*"
>>
>> H.
>>
>>
>>> On Fri, Apr 24, 2015 at 10:18 AM, Michael Lawrence <michafla at gene.com
>>> <mailto:michafla at gene.com>> wrote:
>>>
>>>     It is a great idea, but I'm not sure I would use it to implement
>>>     table(). Allocating those strings will be costly. Don't we already
>>>     have the 4-way int hash? Of course, my intuition might be completely
>>>     off here.
>>>
>>>
>>>     On Fri, Apr 24, 2015 at 9:59 AM, Hervé Pagès <hpages at fredhutch.org
>>>     <mailto:hpages at fredhutch.org>> wrote:
>>>
>>>         Hi Pete,
>>>
>>>         Excellent idea. That will make things like table() work
>>>         out-of-the-box
>>>         on GenomicRanges objects. I'll add that.
>>>
>>>         Thanks,
>>>         H.
>>>
>>>
>>>
>>>         On 04/24/2015 09:43 AM, Peter Haverty wrote:
>>>
>>>             Would people be interested in having this:
>>>
>>>             setMethod("as.character", "GenomicRanges",
>>>                         function(x) {
>>>                             paste0(seqnames(x), ":", start(x), "-",
>>> end(x))
>>>                         })
>>>
>>>             ?
>>>
>>>             I find myself doing that a lot to make unique names or for
>>>             output that
>>>             goes to collaborators.  I suppose we might want to tack on
>>>             the strand if it
>>>             isn't "*".  I have some code for going the other direction
>>>             too, if there is
>>>             interest.
>>>
>>>
>>>
>>>             Pete
>>>
>>>             ____________________
>>>             Peter M. Haverty, Ph.D.
>>>             Genentech, Inc.
>>>             phaverty at gene.com <mailto:phaverty at gene.com>
>>>
>>>                      [[alternative HTML version deleted]]
>>>
>>>             _______________________________________________
>>>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>             mailing list
>>>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>         --
>>>         Hervé Pagès
>>>
>>>         Program in Computational Biology
>>>         Division of Public Health Sciences
>>>         Fred Hutchinson Cancer Research Center
>>>         1100 Fairview Ave. N, M1-B514
>>>         P.O. Box 19024
>>>         Seattle, WA 98109-1024
>>>
>>>         E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>>         Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>>         Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>
>>>
>>>         _______________________________________________
>>>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>         mailing list
>>>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list