[BioC] Is a number within a set of ranges?

Herve Pages hpages at fhcrc.org
Mon Oct 29 21:33:58 CET 2007


Christos Hatzis wrote:
>> pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos
>      [,1] [,2]
> [1,]    1    3
> [2,]    5    9
> [3,]   13   15
>> gene.pos <- c(14,4,10,6)
>> gene.pos
> [1] 14  4 10  6
> 
>> within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x)
> findInterval(g, x)) == 1))
> 
>> gene.pos[within]
> [1] 14  6 

Good to know the existence of findInterval(). Thanks!
For this particular case though, I would be tempted to keep things simple
by replacing this

  any(apply(pos, 1, function(x) findInterval(g, x)) == 1)

by

  any(apply(pos, 1, function(x) x[1] <= g && g <= x[2]))

Not only is the later easier to understand, but with the former, you'll get
wrong results if one of your genes is positioned at one of the Stop positions:

  gene.pos <- c(14,4,10,6,15) # last gene is at a Stop position

  # using findInterval() gives:
    > within
    [1]  TRUE FALSE FALSE  TRUE FALSE
  # using 'x[1] <= g && g <= x[2]' gives:
    > within
    [1]  TRUE FALSE FALSE  TRUE  TRUE

Note that the "findInterval" method can be fixed by specifying
'rightmost.closed=TRUE' but this doesn't make the code easier to
understand, all the contrary...

Cheers,
H.

> 
> Look at ?findInterval, which does all the work.  It returns 1 if within
> range in this case.
> 
> -Christos 
> 
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch 
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
>> Daniel Brewer
>> Sent: Monday, October 29, 2007 12:29 PM
>> To: bioconductor at stat.math.ethz.ch
>> Subject: [BioC] Is a number within a set of ranges?
>>
>> I have a table with a start and stop column which defines a 
>> set of ranges.  I have another table with a list of genes 
>> with associated position.  What I would like to do is subset 
>> the gene table so it only contains genes whose position is 
>> within any of the ranges.  What is the best way to do this?  
>> The only way I can think of is to construct a long list of 
>> conditions linked by ORs but I am sure there must be a better way.
>>
>> Simple example:
>>
>> Start	Stop
>> 1	3
>> 5	9
>> 13	15
>>
>> Gene	Position
>> 1	14
>> 2	4
>> 3	10
>> 4	6
>>
>> I would like to get out:
>> Gene	Position
>> 1	14
>> 4	6
>>
>> Any ideas?
>>
>> Thanks
>>
>> Dan
>>
>> --
>> **************************************************************
>> Daniel Brewer, Ph.D.
>> Institute of Cancer Research
>> Email: daniel.brewer at icr.ac.uk
>> **************************************************************
>>
>> The Institute of Cancer Research: Royal Cancer Hospital, a 
>> charitable Company Limited by Guarantee, Registered in 
>> England under Company No. 534147 with its Registered Office 
>> at 123 Old Brompton Road, London SW7 3RP.
>>
>> This e-mail message is confidential and for use by the...{{dropped:13}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list