[BioC] Is a number within a set of ranges?
Sean Davis
sdavis2 at mail.nih.gov
Mon Oct 29 18:14:10 CET 2007
Daniel Brewer wrote:
> I have a table with a start and stop column which defines a set of
> ranges. I have another table with a list of genes with associated
> position. What I would like to do is subset the gene table so it only
> contains genes whose position is within any of the ranges. What is the
> best way to do this? The only way I can think of is to construct a long
> list of conditions linked by ORs but I am sure there must be a better way.
>
> Simple example:
>
> Start Stop
> 1 3
> 5 9
> 13 15
>
> Gene Position
> 1 14
> 2 4
> 3 10
> 4 6
>
> I would like to get out:
> Gene Position
> 1 14
> 4 6
>
> Any ideas?
Here is a function that I use for finding overlapping segments. It
takes two data.frames, x and y. Each must have "Chr", "Position", and
"end" columns (often used in conjunction with snapCGH--hence, the
Position rather than "start"). The "shift" parameter is a convenience
function for doing "random shift" random distributions of genomic
segments. The function returns the indexes of x and y that overlap.
So, if the first row of the x data.frame overlaps with the first 3 rows
of y, the output will be:
Xindex Yindex
1 1
1 2
1 3
Note that the data.frames can have more than those three columns, but
those three columns MUST be present and named as mentioned.
Hope this helps.
Sean
Attached function below
-----------------------
findOverlappingSegments <-
function(x,y,shift=0) {
swap <- nrow(x)<nrow(y) # Want to have larger set first for speed
if(swap) {
tmpx <- x
x <- y
y <- tmpx
}
intersectChrom <- intersect(x$Chr,y$Chr)
ret <- list()
for(i in intersectChrom) {
aindex <- which(y$Chr==i)
bindex <- which(x$Chr==i)
a <- y[aindex,]
b <- x[bindex,]
overlapsBrow <- mapply(function(Astart, Aend) {
which((Astart <= b$end & Astart>=b$Position) |
(Aend <= b$end & Aend>=b$Position) |
(Astart <= b$Position & Aend>=b$end) |
(Astart >= b$Position & Aend<=b$end))
},a$Position+shift,a$end+shift)
tmp1 <- unlist(overlapsBrow)
xindex <- bindex[tmp1]
yindex <-
aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))]
if(swap) {
ret[[i]]<- cbind(yindex,xindex)
} else {
ret[[i]] <- cbind(xindex,yindex)
}
colnames(ret[[i]]) <- c('Xindex','Yindex')
}
return(do.call(rbind,ret))
}
More information about the Bioconductor
mailing list