[Bioc-devel] Extending GenomicRanges::`intra-range-methods`

Stuart Lee |ee@@ @end|ng |rom weh|@edu@@u
Tue Sep 17 05:33:36 CEST 2019


Hi Aditya,

I think straddle would be a great addition to plyranges. Happy for you to put in a PR and add you as a contributor.

Maybe instead of specifying the start etc. we could dispatch on anchored ranges instead? So we’d follow the anchor_start(gr) %>% straddle(). We could also have the directed version for considering strands.

https://github.com/sa-lee/plyranges

Thanks,
Stuart

---
Stuart Lee
Visiting PhD Student - Ritchie Lab



On 13 Sep 2019, at 22:38, Michael Lawrence <lawrence.michael using gene.com<mailto:lawrence.michael using gene.com>> wrote:

Thanks for these suggestions; I think they're worth considering.

I've never been totally satisfied with (my function) flank(), because
it's limited and its arguments are somewhat obscure in meaning. You
can check out what we did in plyranges:
https://rdrr.io/bioc/plyranges/man/flank-ranges.html. Your functions
are more flexible, because they are two-way about the endpoint, like
promoters(). Sometimes I've solved that with resize(flank()), but
that's not ideal.  Maybe a better name is "straddle" for when ranges
straddle one of the endpoints? In keeping with the current pattern of
Ranges API, there would be a single function: straddle(x, side, left,
right, ignore.strand=FALSE). So straddle(x, "start", -100, 10) would
be like promoters(x, 100, 10) for a positive or "*" strand range. That
brings up strandedness, which needs to be considered here. For
unstranded ranges, it may be that direct start() and end()
manipulation is actually more transparent than a special verb. I
wonder what Stuart Lee thinks?

The functions that involve reduce() wouldn't fit into the intrarange
operations, as they are summarizing ranges, not transforming them.
They may be going too far.

Michael

On Fri, Sep 13, 2019 at 4:48 AM Bhagwat, Aditya
<Aditya.Bhagwat using mpi-bn.mpg.de<mailto:Aditya.Bhagwat using mpi-bn.mpg.de>> wrote:

Dear bioc-devel,

The ?GenomicRanges::`intra-range-methods` are very useful for range arithmetic<https://genomicsclass.github.io/book/pages/figure/bioc1_igranges-unnamed-chunk-6-1.png>

Feedback request: would it be of general use to add the methods below to the GenomicRanges::`intra-range-methods` palette (after properly S4-ing them)?
Or shall I keep them in multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>?
Additional feedback welcome as well (e.g. re-implementation of already existing functionality).


1)     Left flank

#' Left flank
#' @param gr   \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number: flank start (relative to range start)
#' @param leftend   number: flank end   (relative to range start)
#' @return a \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' left_flank(gr)
left_flank <- function(gr, leftstart = -200, leftend   = -1){

   # Assert
   assert_is_identical_to_true(is(gr, 'GRanges'))
   assert_is_a_number(leftstart)
   assert_is_a_number(leftend)

   # Flank
   newranges <- gr
   end(newranges)   <- start(gr) + leftend
   start(newranges) <- start(gr) + leftstart

   # Return
   newranges
}


2)     Right flank

#' Right flank
#' @param gr    \code{\link[GenomicRanges]{GRanges-class}}
#' @param rightstart number: flank start (relative to range end)
#' @param rightend   number: flank end   (relative to range end)
#' @return     \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' right_flank(gr)
#' @export
right_flank <- function(gr, rightstart = 1, rightend   = 200){

   # Assert
   assert_is_identical_to_true(is(gr, 'GRanges'))
   assert_is_a_number(rightstart)
   assert_is_a_number(rightend)
   assert_is_a_bool(verbose)

   # Flank
   newranges <- gr
   start(newranges) <- end(newranges) + rightstart
   end(newranges)   <- end(newranges) + rightend

   # Plot
   if (plot)  plot_intervals(GRangesList(sites = gr, rightflanks = newranges))

   # Return
   cmessage('\t\t%d right flanks : [end%s%d, end%s%d]',
           length(newranges),
           csign(rightstart),
           abs(rightstart),
           csign(rightend),
           abs(rightend))
   newranges
}


3)     Slop

#' Slop (i.e. extend left/right)
#' @param gr        \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number: flank start (relative to range start)
#' @param rightend  number: flank end   (relative to range end)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @export
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' slop(gr)
#' @export
slop <- function(gr, leftstart = -22, rightend  =  22){

   # Assert
   assert_is_identical_to_true(methods::is(gr, 'GRanges'))
   assert_is_a_number(leftstart)
   assert_is_a_number(rightend)
   assert_is_a_bool(verbose)

   # Slop
   newranges <- gr
   start(newranges) <- start(newranges) + leftstart
   end(newranges)   <- end(newranges)   + rightend

   # Return
   newranges
}


4)     Flank fourways

#' Flank fourways
#'
#' Flank left and right, for both strands, and merge overlaps
#' @param gr          \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart   number: left flank start  (relative to range start)
#' @param leftend     number: left flank  end   (relative to range start)
#' @param rightstart  number: right flank start (relative to range end)
#' @param rightend    number: right flank end   (relative to range end)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' granges <- read_bed(bedfile, bsgenome)
#' flank_fourways(granges)
#' @export
flank_fourways <- function(gr, leftstart  = -200, leftend    =   -1, rightstart =    1, rightend   =  200){

   # Comply
   . <- NULL

   # Flank
   left <-  left_flank( gr, leftstart, leftend)
   right <- right_flank(gr,rightstart, rightend)
   newranges <- c(left, right)

   # Complement
   newranges %<>% c(invertStrand(.))

   # Merge overlaps
   newranges %<>% reduce() # GenomicRanges::reduce

   # Return
   newranges
}



5)     Slop fourways

#' Slop granges for both strands, merging overlaps
#' @param gr   \code{\link[GenomicRanges]{GRanges-class}}
#' @param leftstart number
#' @param rightend  number
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' gr <- read_bed(bedfile, bsgenome)
#' gr
#' slop_fourways(gr)
#' @export
slop_fourways <- function(gr, leftstart = -22, rightend  =  22){

   # Comply
   . <- NULL

   # Slop
   if (verbose) cmessage('\tSlop fourways')
   newranges <- slop(gr, leftstart, rightend, verbose = verbose)

   # Complement
   newranges %<>% c(invertStrand(.))

   # Merge overlaps
   newranges %<>% reduce()

   # Return
   newranges
}

Thankyou!

Aditya






       [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla using gene.com<mailto:michafla using gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube


_______________________________________________

The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.

The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the Kulin
Nation as the traditional owners of the land where our campuses are located and
the continuing connection to country and community.
_______________________________________________

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list