[BioC] Genome position to miRNA or gene name

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Jan 23 00:33:39 CET 2009


On Jan 22, 2009, at 12:27 AM, Martin Morgan wrote:

> ... [lot's of snippage] ...
> Very roughly, this
>  fl <- "ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/genomes/hsa.gff 
> "
> seems to be the Human miRNA data base from the Sanger. It parses in to
> an R data frame with
>  gff <- read.table(fl)
> (hey, that's cool -- pulling the data directly from ftp!). The fourth
> and fifth columns are chromosomal positions; there is also information
> on chromosome (column 1) and strand (column 7).

I never paid close enough attention to what types of sources were  
expected here to realize you can do that ... that's sweet.

> My strategy would be to loop over your aligned sequences by chromosome
> and strand, and for each subset construct an IRanges object (from the
> IRanges package) that contains the start and end position of all
> sequenes. Suppose we have the 'start' and 'end' of each sequence on
> chromosome 1
>  seqs <- IRanges(start, end)
> and the same for the gff data
>  miRNAs <- with(gff[gff$V1 == "1" & gff$V7 == "+",], IRanges(V4, V5))
> Then use 'overlap' from IRanges. Along the lines of
>  overlap(miRNAs, seqs, multiple=FALSE)

When I did a similar thing to what was asked for a while ago  
(essentially seeing where my own regions of the genome "collided" with  
annotated regions), I did pretty much what I described before.

It involved subtracting my start sites from the annotated start sites  
and checking to see if the subtracted start/ends combined in one of  
several ways to give me a collision/hit. There was a fare share of  
tedium involved in that ...

Thanks for pointing out how to do it the smart way by leveraging code  
already written in the packages, which lead to tripping over the  
Interval Tree data structure as well. This might have been the coolest  
thing I've seen all day ;-)

Note to self is to look at the IRanges and related packages much more  
closely ...

Thanks again,

Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University


More information about the Bioconductor mailing list