[BioC] Is a number within a set of ranges?
James W. MacDonald
jmacdon at med.umich.edu
Mon Oct 29 19:15:30 CET 2007
Or a more simplistic alternative that will work with the data provided:
> mat <- matrix(c(1,5,13,3,9,15), ncol=2)
> gn <- matrix(c(14,4,10,6), ncol=1)
> a <- apply(gn, 1, function(x) any(x > mat[,1] & x < mat[,2]))
> gn[a,]
[1] 14 6
Best,
Jim
Sean Davis wrote:
> Daniel Brewer wrote:
>> I have a table with a start and stop column which defines a set of
>> ranges. I have another table with a list of genes with associated
>> position. What I would like to do is subset the gene table so it only
>> contains genes whose position is within any of the ranges. What is the
>> best way to do this? The only way I can think of is to construct a long
>> list of conditions linked by ORs but I am sure there must be a better way.
>>
>> Simple example:
>>
>> Start Stop
>> 1 3
>> 5 9
>> 13 15
>>
>> Gene Position
>> 1 14
>> 2 4
>> 3 10
>> 4 6
>>
>> I would like to get out:
>> Gene Position
>> 1 14
>> 4 6
>>
>> Any ideas?
>
> Here is a function that I use for finding overlapping segments. It
> takes two data.frames, x and y. Each must have "Chr", "Position", and
> "end" columns (often used in conjunction with snapCGH--hence, the
> Position rather than "start"). The "shift" parameter is a convenience
> function for doing "random shift" random distributions of genomic
> segments. The function returns the indexes of x and y that overlap.
> So, if the first row of the x data.frame overlaps with the first 3 rows
> of y, the output will be:
>
> Xindex Yindex
> 1 1
> 1 2
> 1 3
>
> Note that the data.frames can have more than those three columns, but
> those three columns MUST be present and named as mentioned.
>
> Hope this helps.
>
> Sean
>
> Attached function below
> -----------------------
>
> findOverlappingSegments <-
> function(x,y,shift=0) {
> swap <- nrow(x)<nrow(y) # Want to have larger set first for speed
> if(swap) {
> tmpx <- x
> x <- y
> y <- tmpx
> }
> intersectChrom <- intersect(x$Chr,y$Chr)
> ret <- list()
> for(i in intersectChrom) {
> aindex <- which(y$Chr==i)
> bindex <- which(x$Chr==i)
> a <- y[aindex,]
> b <- x[bindex,]
> overlapsBrow <- mapply(function(Astart, Aend) {
> which((Astart <= b$end & Astart>=b$Position) |
> (Aend <= b$end & Aend>=b$Position) |
> (Astart <= b$Position & Aend>=b$end) |
> (Astart >= b$Position & Aend<=b$end))
> },a$Position+shift,a$end+shift)
> tmp1 <- unlist(overlapsBrow)
> xindex <- bindex[tmp1]
> yindex <-
> aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))]
> if(swap) {
> ret[[i]]<- cbind(yindex,xindex)
> } else {
> ret[[i]] <- cbind(xindex,yindex)
> }
> colnames(ret[[i]]) <- c('Xindex','Yindex')
> }
> return(do.call(rbind,ret))
> }
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list