[BioC] How to annotate genomic coordinates
James W. MacDonald
jmacdon at uw.edu
Thu Nov 8 15:46:20 CET 2012
Hi Jose,
On 11/8/2012 8:19 AM, José Luis Lavín wrote:
> Dear Bioconductor list,
>
> I write you this email asking for a Bioconductor module that allows me to
> annotate genomic coordinates and get different GeneIds.
> I'll show you an example of what I'm referring to:
>
> I have this data:
> Chromosome coordinate
> chr17 31246506
It depends on what that coordinate is. Is it the start of a transcript?
A SNP? Do you really just have a single coordinate, or do you have a
range? What species are we talking about here?
Depending on what your data are, you might want to use either one of the
TxDb packages, or a SNPlocs package. There really isn't much to go on
here. If I assume this is a coordinate that one might think is within an
exon, and if I further assume you are working with H. sapiens, I could
suggest something like
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
ex <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
x <- GRanges(seq = "chr17", IRanges(start = 31245606, width = 1))
ex[ex %in% x]
or maybe more appropriately
names(ex)[ex %in% x]
which will give you the Gene ID, and you can go from there using the
org.Hs.eg.db package.
If however, your coordinate isn't in an exon, but might be in a UTR, you
can look at ?exonsBy to see what other sequences you can extract to
compare with.
If these are SNPs, then you can look at the help pages for the relevant
SNPlocs package.
Best,
Jim
>
> which can also be written this way by the program that yielded the result:
> chr17.31246506
>
> And I need to convert this data into a gene name and known gene Ids, such
> as:
>
> Gene name Entrez_ID Ensembl_ID
>
> Tff3 NM_011575 20050
> Can you please advice me about a module able to perform this ID conversion
> using a list of "chr17.31246506" type coordinates as input?
>
> Thanks in advance
>
> With best wishes
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list