[Bioc-devel] Known Genes replaced by GENCODE genes at UCSC

Paul Grosu pgrosu at gmail.com
Mon Jan 11 21:40:25 CET 2016


Tim, you always crack me up! :)  I totally agree, and it would probably be
good to also have the tools enabled to download directly from Ensembl, NCBI,
cloud-annotation source, etc. and build/update the AnnDbBimap objects.  This
way the annotation sources can maintain the data and us the scripts,
including the pre-built AnnDbBimap objects just in case.

~p

-----Original Message-----
From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org] On Behalf Of Tim
Triche, Jr.
Sent: Monday, January 11, 2016 2:02 PM
To: Vincent Carey
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Known Genes replaced by GENCODE genes at UCSC

ENSEMBL

knownGene was always a disaster.  For extra amusement/horror, be sure to
check out the sad saga of the TCGA GAF and its disconnection from knownGenes
as well as reality.  Three cheers for rendering transcript-level estimates
useless (and no this was not Katie's fault)

Rainer and many others have made a herculean effort to bring all the BioC
annotation infrastructure into the 21st century... having worked with
Kallisto extensively of late, I see no reason to use a non-ENSEMBL
"conservative" reference transcriptome (I see plenty of reasons to use
miTranscriptome, etc. but that is another discussion).

sorry if slighting anyone/everyone, but ENSEMBL is the clear choice IMHO.

$0.02 - transmission costs


--t

On Mon, Jan 11, 2016 at 10:57 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

> I think these are all good observations and we may benefit from a 
> wider discussion on the support site?
>
> the abandonment of knownGene seems to have clear implications for 
> changing our most visible txdb examples.  what should we change to?  
> can we make a more future-proof design for these annotation 
> selections?
>
> On Mon, Jan 11, 2016 at 1:40 PM, Robert Castelo 
> <robert.castelo at upf.edu>
> wrote:
>
> > hi,
> >
> > On 01/11/2016 04:07 PM, Vincent Carey wrote:
> > [...]
> >
> >> Is it true that there is an asymmetry between Entrez gene ID and 
> >> Ensembl gene ID for querying org.Hs.eg.db (I tend to prefer 
> >> Homo.sapiens as a symbol mapping resource)?  Both ENTREZID and 
> >> ENSEMBL are listed as keytypes.  My question is whether this 
> >> "anchor" concept holds in the current infrastructure.
> >>
> >
> > you're right that the infrastructure is probably symmetric at least 
> > between Entrez and Ensembl, so maybe i'm not using the term "anchor"
> > correctly here, i'm just referring to the fact that many package
> functions
> > and use cases of BioC are based in, or illustrated, using Entrez IDs.
> > examples are:
> >
> > head(org.Hs.eg.db::keys(org.Hs.eg.db))
> > [1] "1"  "2"  "3"  "9"  "10" "11"
> >
> > i.e., by default the 'keytype' is 'ENTREZID'
> >
> > genefilter::nsFilter() argument 'require.entrez' filters out 
> > features without an Entrez Gene ID annotation.
> >
> > Category::categoryToEntrezBuilder() returns a list mapping category 
> > ids
> to
> > the Entrez Gene ids annotated at the cateogry id.
> >
> > SummarizedExperiment::geneRangeMapper() takes a 'TxDb' object and a 
> > keytype to map ranges to genes. By default the keytype is 'ENTREZID'
> >
> > some of the workflows are also based on Entrez IDs, such as:
> >
> >
> http://www.bioconductor.org/help/workflows/annotation/Annotation_Resou
> rces
> >
> > http://www.bioconductor.org/help/workflows/variants
> >
> > so if the user just replaces the txdb object in one of those 
> > examples or argument functions by a txdb object that does not have 
> > Entrez identifiers as primary gene key, those functions, examples or 
> > workflows will require modification. this is not necessarily bad, 
> > but may put more burden on the user who is learning with a "default"
TxDb human gene annotation package.
> > this has been so far the *.UCSC.knownGene using Entrez as gene
> identifiers.
> > given the apparent discontinuity of UCSC with the known gene track, 
> > i
> would
> > suggest to put available at the BioC site another default gene 
> > annotation package, but then one based on Entrez identifiers given 
> > the amount of legacy code and documentation using Entrez in one way or
another.
> >
> > an alternative to translating the default Ensembl Gencode 
> > identifiers
> into
> > Entrez would be to just take the NCBI RefSeq annotations as human 
> > gene annotation package available by default, i.e., replacing 
> > current *.UCSC.knownGene by *.UCSC.refGene
> >
> >
> >
> > robert.
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list