[Bioc-devel] Gene annotation: TxDb vs ENSEMBL/NCBI inconsistency
Robert M. Flight
rflight79 at gmail.com
Wed Jun 3 15:56:29 CEST 2015
Ludwig,
If you do this search on the UCSC genome browser (which this annotation
package is built from), you will see that the longest variant is what is
shown
http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg38&position=brca1&hgt.positionInput=brca1&hgt.suggestTrack=knownGene&Submit=submit&hgsid=429339723_8sd4QD2jSAnAsa6cVCevtoOy4GAz&pix=1885
If instead of "genes" you do "transcripts", you will see 20 different
transcripts for this gene, including the one listed by NCBI.
I havent tried it yet (haven't upgraded R or bioconductor to latest
version), but there is now an Ensembl based annotation package as well,
that may work better??
http://bioconductor.org/packages/release/data/annotation/html/EnsDb.Hsapiens.v79.html
-Robert
On Wed, Jun 3, 2015 at 7:04 AM Ludwig Geistlinger <
Ludwig.Geistlinger at bio.ifi.lmu.de> wrote:
> Dear Bioc annotation team,
>
> Querying TxDb.Hsapiens.UCSC.hg38.knownGene for gene coordinates, e.g. for
>
> BRCA1; ENSG00000012048; entrez:672
>
> via
>
> > genes(TxDb.Hsapiens.UCSC.hg38.knownGene, vals=list(gene_id="672"))
>
> gives me:
>
> GRanges object with 1 range and 1 metadata column:
> seqnames ranges strand | gene_id
> <Rle> <IRanges> <Rle> | <character>
> 672 chr17 [43044295, 43170403] - | 672
> -------
> seqinfo: 455 sequences (1 circular) from hg38 genome
>
>
> However, querying Ensembl and NCBI Gene
> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000012048
> http://www.ncbi.nlm.nih.gov/gene/672
>
> the gene is located at (note the difference in the end position)
>
> Chromosome 17: 43,044,295-43,125,483 reverse strand
>
>
> How is the inconsistency explained and how to extract an ENSEMBL/NCBI
> conform annotation from the TxDb object?
> (I am aware of biomaRt, but I want to explicitely use the Bioc annotation
> functionality).
>
> Thanks!
> Ludwig
>
>
> --
> Dipl.-Bioinf. Ludwig Geistlinger
>
> Lehr- und Forschungseinheit für Bioinformatik
> Institut für Informatik
> Ludwig-Maximilians-Universität München
> Amalienstrasse 17, 2. Stock, Büro A201
> 80333 München
>
> Tel.: 089-2180-4067
> eMail: Ludwig.Geistlinger at bio.ifi.lmu.de
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list