[BioC] Genome position to miRNA or gene name
mailinglist.honeypot at gmail.com
Fri Jan 23 00:33:39 CET 2009
On Jan 22, 2009, at 12:27 AM, Martin Morgan wrote:
> ... [lot's of snippage] ...
> Very roughly, this
> fl <- "ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/genomes/hsa.gff
> seems to be the Human miRNA data base from the Sanger. It parses in to
> an R data frame with
> gff <- read.table(fl)
> (hey, that's cool -- pulling the data directly from ftp!). The fourth
> and fifth columns are chromosomal positions; there is also information
> on chromosome (column 1) and strand (column 7).
I never paid close enough attention to what types of sources were
expected here to realize you can do that ... that's sweet.
> My strategy would be to loop over your aligned sequences by chromosome
> and strand, and for each subset construct an IRanges object (from the
> IRanges package) that contains the start and end position of all
> sequenes. Suppose we have the 'start' and 'end' of each sequence on
> chromosome 1
> seqs <- IRanges(start, end)
> and the same for the gff data
> miRNAs <- with(gff[gff$V1 == "1" & gff$V7 == "+",], IRanges(V4, V5))
> Then use 'overlap' from IRanges. Along the lines of
> overlap(miRNAs, seqs, multiple=FALSE)
When I did a similar thing to what was asked for a while ago
(essentially seeing where my own regions of the genome "collided" with
annotated regions), I did pretty much what I described before.
It involved subtracting my start sites from the annotated start sites
and checking to see if the subtracted start/ends combined in one of
several ways to give me a collision/hit. There was a fare share of
tedium involved in that ...
Thanks for pointing out how to do it the smart way by leveraging code
already written in the packages, which lead to tripping over the
Interval Tree data structure as well. This might have been the coolest
thing I've seen all day ;-)
Note to self is to look at the IRanges and related packages much more
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
More information about the Bioconductor