[Bioc-sig-seq] Short read overlap function

Joern Toedling Joern.Toedling at curie.fr
Fri Jul 24 17:28:36 CEST 2009


Hello,

I guess the mentioned functions certainly qualify as 'proper' overlap
functions. What you probably want is a relatively straightforward
post-processing of the result. Below is a function that I have written to
restrict overlapping pairs to those pairs that overlap by at least a specified
fraction of the smaller interval's length. Setting this fraction to 1.0 will
only give you pairs in which one of the intervals is contained in the other
one. This function uses genomeIntervals, but I am sure that post-processing
the IRanges is equally straightforward.

Hope this helps,
Joern


fracOverlap <- function(I1, I2, min.frac=1.0){
  require("genomeIntervals")
  stopifnot(inherits(I1,"Genome_intervals"),
            inherits(I1,"Genome_intervals"))
  ov <- interval_overlap(I1,I2)
  # get base pair overlap
  lens <- sapply(ov, length)
  overlap1 <- rep(1:length(ov), lens)
  overlap2 <- unlist(ov, use.names=FALSE)
  left <- pmax(I1[overlap1,1], I2[overlap2,1])
  right <- pmin(I1[overlap1,2], I2[overlap2,2])
  stopifnot(all(right >= left))
  bases <- right-left+1
  min.len <- pmin(I1[overlap1,2]- I1[overlap1,1]+1,
                  I2[overlap2,2]- I2[overlap2,1]+1)
  frac <- round(bases/min.len, digits=2)
  res <- data.frame("Index1"=overlap1, "Index2"=overlap2,
                    "n"=bases, "fraction"=frac)
  res <- subset(res, fraction >= min.frac)
  return(res)
}# fracOverlap


On Fri, 24 Jul 2009 16:47:04 +0200, Johannes Waage wrote
> Hi all,
> 
> In assigning RNA-seq data to exon-models, I'm looking for a proper overlap
> function. Both IRanges and genomeIntervals have overlap functions, 
> but as far as I can see, these don't have options for contained 
> overlaps, example:
> 
> |-------Range 1-------]
>      [----Range 2----]
> 
> IRanges, genomeIntervals: TRUE
> Wanted: TRUE
> 
> |-------Range 1-------]
>                  [----Range 2----]
> 
> IRanges, genomeIntervals: TRUE
> Wanted: FALSE
> 
> Any suggestions are appreciated!
> 
> Regards,
> Johannes Waage,
> Uni. of Copenhagen
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


---
Joern Toedling
Institut Curie -- U900
26 rue d'Ulm, 75005 Paris, FRANCE
Tel. +33 (0)156246926



More information about the Bioc-sig-sequencing mailing list