[BioC] Is a number within a set of ranges?

Sean Davis sdavis2 at mail.nih.gov
Mon Oct 29 18:14:10 CET 2007


Daniel Brewer wrote:
> I have a table with a start and stop column which defines a set of
> ranges.  I have another table with a list of genes with associated
> position.  What I would like to do is subset the gene table so it only
> contains genes whose position is within any of the ranges.  What is the
> best way to do this?  The only way I can think of is to construct a long
> list of conditions linked by ORs but I am sure there must be a better way.
> 
> Simple example:
> 
> Start	Stop
> 1	3
> 5	9
> 13	15
> 
> Gene	Position
> 1	14
> 2	4
> 3	10
> 4	6
> 
> I would like to get out:
> Gene	Position
> 1	14
> 4	6
> 
> Any ideas?

Here is a function that I use for finding overlapping segments.  It
takes two data.frames, x and y.  Each must have "Chr", "Position", and
"end" columns (often used in conjunction with snapCGH--hence, the
Position rather than "start").  The "shift" parameter is a convenience
function for doing "random shift" random distributions of genomic
segments.  The function returns the indexes of x and y that overlap.
So, if the first row of the x data.frame overlaps with the first 3 rows
of y, the output will be:

Xindex	Yindex
1	1
1	2
1	3

Note that the data.frames can have more than those three columns, but
those three columns MUST be present and named as mentioned.

Hope this helps.

Sean

Attached function below
-----------------------

findOverlappingSegments <-
  function(x,y,shift=0) {
    swap <- nrow(x)<nrow(y) # Want to have larger set first for speed
    if(swap) {
      tmpx <- x
      x <- y
      y <- tmpx
    }
    intersectChrom <- intersect(x$Chr,y$Chr)
    ret <- list()
    for(i in intersectChrom) {
      aindex <- which(y$Chr==i)
      bindex <- which(x$Chr==i)
      a <- y[aindex,]
      b <- x[bindex,]
      overlapsBrow <- mapply(function(Astart, Aend) {
        which((Astart <= b$end & Astart>=b$Position) |
              (Aend <= b$end & Aend>=b$Position) |
              (Astart <= b$Position & Aend>=b$end) |
              (Astart >= b$Position & Aend<=b$end))
      },a$Position+shift,a$end+shift)
      tmp1 <- unlist(overlapsBrow)
      xindex <- bindex[tmp1]
      yindex <-
aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))]
      if(swap) {
        ret[[i]]<- cbind(yindex,xindex)
      } else {
        ret[[i]] <- cbind(xindex,yindex)
      }
      colnames(ret[[i]]) <- c('Xindex','Yindex')
    }
    return(do.call(rbind,ret))
  }



More information about the Bioconductor mailing list