[Bioc-devel] Known Genes replaced by GENCODE genes at UCSC

Tim Triche, Jr. tim.triche at gmail.com
Mon Jan 11 20:02:27 CET 2016


ENSEMBL

knownGene was always a disaster.  For extra amusement/horror, be sure to
check out the sad saga of the TCGA GAF and its disconnection from
knownGenes as well as reality.  Three cheers for rendering transcript-level
estimates useless (and no this was not Katie's fault)

Rainer and many others have made a herculean effort to bring all the BioC
annotation infrastructure into the 21st century... having worked with
Kallisto extensively of late, I see no reason to use a non-ENSEMBL
"conservative" reference transcriptome (I see plenty of reasons to use
miTranscriptome, etc. but that is another discussion).

sorry if slighting anyone/everyone, but ENSEMBL is the clear choice IMHO.

$0.02 - transmission costs


--t

On Mon, Jan 11, 2016 at 10:57 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

> I think these are all good observations and we may benefit from a wider
> discussion on the support site?
>
> the abandonment of knownGene seems to have clear implications for changing
> our most visible txdb
> examples.  what should we change to?  can we make a more future-proof
> design for these annotation selections?
>
> On Mon, Jan 11, 2016 at 1:40 PM, Robert Castelo <robert.castelo at upf.edu>
> wrote:
>
> > hi,
> >
> > On 01/11/2016 04:07 PM, Vincent Carey wrote:
> > [...]
> >
> >> Is it true that there is an asymmetry between Entrez gene ID and Ensembl
> >> gene ID for querying org.Hs.eg.db (I tend to prefer Homo.sapiens
> >> as a symbol mapping resource)?  Both ENTREZID and ENSEMBL are listed as
> >> keytypes.  My question is whether this "anchor" concept
> >> holds in the current infrastructure.
> >>
> >
> > you're right that the infrastructure is probably symmetric at least
> > between Entrez and Ensembl, so maybe i'm not using the term "anchor"
> > correctly here, i'm just referring to the fact that many package
> functions
> > and use cases of BioC are based in, or illustrated, using Entrez IDs.
> > examples are:
> >
> > head(org.Hs.eg.db::keys(org.Hs.eg.db))
> > [1] "1"  "2"  "3"  "9"  "10" "11"
> >
> > i.e., by default the 'keytype' is 'ENTREZID'
> >
> > genefilter::nsFilter() argument 'require.entrez' filters out features
> > without an Entrez Gene ID annotation.
> >
> > Category::categoryToEntrezBuilder() returns a list mapping category ids
> to
> > the Entrez Gene ids annotated at the cateogry id.
> >
> > SummarizedExperiment::geneRangeMapper() takes a 'TxDb' object and a
> > keytype to map ranges to genes. By default the keytype is 'ENTREZID'
> >
> > some of the workflows are also based on Entrez IDs, such as:
> >
> >
> http://www.bioconductor.org/help/workflows/annotation/Annotation_Resources
> >
> > http://www.bioconductor.org/help/workflows/variants
> >
> > so if the user just replaces the txdb object in one of those examples or
> > argument functions by a txdb object that does not have Entrez identifiers
> > as primary gene key, those functions, examples or workflows will require
> > modification. this is not necessarily bad, but may put more burden on the
> > user who is learning with a "default" TxDb human gene annotation package.
> > this has been so far the *.UCSC.knownGene using Entrez as gene
> identifiers.
> > given the apparent discontinuity of UCSC with the known gene track, i
> would
> > suggest to put available at the BioC site another default gene annotation
> > package, but then one based on Entrez identifiers given the amount of
> > legacy code and documentation using Entrez in one way or another.
> >
> > an alternative to translating the default Ensembl Gencode identifiers
> into
> > Entrez would be to just take the NCBI RefSeq annotations as human gene
> > annotation package available by default, i.e., replacing current
> > *.UCSC.knownGene by *.UCSC.refGene
> >
> >
> >
> > robert.
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list