[Bioc-devel] Known Genes replaced by GENCODE genes at UCSC
pgrosu at gmail.com
Mon Jan 11 21:40:25 CET 2016
Tim, you always crack me up! :) I totally agree, and it would probably be
good to also have the tools enabled to download directly from Ensembl, NCBI,
cloud-annotation source, etc. and build/update the AnnDbBimap objects. This
way the annotation sources can maintain the data and us the scripts,
including the pre-built AnnDbBimap objects just in case.
From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org] On Behalf Of Tim
Sent: Monday, January 11, 2016 2:02 PM
To: Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Known Genes replaced by GENCODE genes at UCSC
knownGene was always a disaster. For extra amusement/horror, be sure to
check out the sad saga of the TCGA GAF and its disconnection from knownGenes
as well as reality. Three cheers for rendering transcript-level estimates
useless (and no this was not Katie's fault)
Rainer and many others have made a herculean effort to bring all the BioC
annotation infrastructure into the 21st century... having worked with
Kallisto extensively of late, I see no reason to use a non-ENSEMBL
"conservative" reference transcriptome (I see plenty of reasons to use
miTranscriptome, etc. but that is another discussion).
sorry if slighting anyone/everyone, but ENSEMBL is the clear choice IMHO.
$0.02 - transmission costs
On Mon, Jan 11, 2016 at 10:57 AM, Vincent Carey <stvjc at channing.harvard.edu>
> I think these are all good observations and we may benefit from a
> wider discussion on the support site?
> the abandonment of knownGene seems to have clear implications for
> changing our most visible txdb examples. what should we change to?
> can we make a more future-proof design for these annotation
> On Mon, Jan 11, 2016 at 1:40 PM, Robert Castelo
> <robert.castelo at upf.edu>
> > hi,
> > On 01/11/2016 04:07 PM, Vincent Carey wrote:
> > [...]
> >> Is it true that there is an asymmetry between Entrez gene ID and
> >> Ensembl gene ID for querying org.Hs.eg.db (I tend to prefer
> >> Homo.sapiens as a symbol mapping resource)? Both ENTREZID and
> >> ENSEMBL are listed as keytypes. My question is whether this
> >> "anchor" concept holds in the current infrastructure.
> > you're right that the infrastructure is probably symmetric at least
> > between Entrez and Ensembl, so maybe i'm not using the term "anchor"
> > correctly here, i'm just referring to the fact that many package
> > and use cases of BioC are based in, or illustrated, using Entrez IDs.
> > examples are:
> > head(org.Hs.eg.db::keys(org.Hs.eg.db))
> >  "1" "2" "3" "9" "10" "11"
> > i.e., by default the 'keytype' is 'ENTREZID'
> > genefilter::nsFilter() argument 'require.entrez' filters out
> > features without an Entrez Gene ID annotation.
> > Category::categoryToEntrezBuilder() returns a list mapping category
> > ids
> > the Entrez Gene ids annotated at the cateogry id.
> > SummarizedExperiment::geneRangeMapper() takes a 'TxDb' object and a
> > keytype to map ranges to genes. By default the keytype is 'ENTREZID'
> > some of the workflows are also based on Entrez IDs, such as:
> > http://www.bioconductor.org/help/workflows/variants
> > so if the user just replaces the txdb object in one of those
> > examples or argument functions by a txdb object that does not have
> > Entrez identifiers as primary gene key, those functions, examples or
> > workflows will require modification. this is not necessarily bad,
> > but may put more burden on the user who is learning with a "default"
TxDb human gene annotation package.
> > this has been so far the *.UCSC.knownGene using Entrez as gene
> > given the apparent discontinuity of UCSC with the known gene track,
> > i
> > suggest to put available at the BioC site another default gene
> > annotation package, but then one based on Entrez identifiers given
> > the amount of legacy code and documentation using Entrez in one way or
> > an alternative to translating the default Ensembl Gencode
> > identifiers
> > Entrez would be to just take the NCBI RefSeq annotations as human
> > gene annotation package available by default, i.e., replacing
> > current *.UCSC.knownGene by *.UCSC.refGene
> > robert.
> [[alternative HTML version deleted]]
> Bioc-devel at r-project.org mailing list
[[alternative HTML version deleted]]
Bioc-devel at r-project.org mailing list
More information about the Bioc-devel