[BioC] Genome position to miRNA or gene name
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Jan 20 21:19:44 CET 2009
Hi,
> I have got a set of human genome locations that I have found using
> Biostrings and BSGenome alignment e.g.
>
> seqname start end strand patternID
> chr9 95978065 95978085 + TGAGGTAGTAGGTTGTATAGT
> chr11 121522487 121522507 - TGAGGTAGTAGGTTGTATAGT
> chr22 44887296 44887316 + TGAGGTAGTAGGTTGTATAGT
> chr22 44888235 44888256 + TGAGGTAGTAGGTTGTGTGGTT
>
> What I would like to know is whether this genome location is within a
> known miRNA or gene. What would the best way be to go about this?
One way could be to grab the appropriate GTF file for Hsapiens here:
ftp://ftp.ensembl.org/pub/current_gtf/
It's just a tab delimited file with genome annotations. You can just
collect the lines for miRNA annotations and see if your positions fall
in the bounds of the known/annotated miRNA's.
For example, here's one:
18 miRNA exon 38162 38272 . + . gene_id
"ENSG00000221441" ...
You can also get miRNA data from miRBase (http://microrna.sanger.ac.uk/sequences/
) perhaps it's a more complete set of data?). The data file I have
from them doesn't have chromosome positions, but does have the stem-
loop sequence, so that would require an intermediate alignment step
before getting at what you're after.
Probably not the best way, but a way nonetheless. If you're
comfortable using biomaRt, I'm sure there's a way to pull down the
same annotation info and do the comparison from there.
Sorry ... no R code, but hopefully it's helpful nonetheless :-)
-steve
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
http://cbio.mskcc.org/~lianos
More information about the Bioconductor
mailing list