[Bioc-devel] naming of TxDb packages

Hervé Pagès hpages at fhcrc.org
Fri Nov 4 18:37:23 CET 2011


Hi Michael,

On 11-11-03 06:36 PM, Michael Lawrence wrote:
> We're actually using a patched version of makeTranscriptDbFromBiomart to
> get models out of an internal biomart.  Patch is on its way to Marc.
>
> So it would be like: TxDb.Hsapiens.BioMart.hg19.gneGenes?

This suggests that you have a Mart called "hg19" (see below why).

>
> Seems weird to mix the technical mode of data retrieval into the name.

The naming scheme when 'Data source' is "BioMart" seems to be a little
bit different. For example, if I use makeTranscriptDbFromBiomart() with
biomart="ensembl" and dataset="hsapiens_gene_ensembl", then I get:

   > GenomicFeatures:::.makePackageName(txdb)
   [1] "TxDb.Hsapiens.BioMart.ensembl.GRCh37.p5"

Token #4 ("ensembl") is the name of the Mart. I'm a little bit
surprised with token #5 though. I would have expected it to be
the ensembl version (eventually followed by the reference genome)
because one can always infer the reference genome from the ensembl
version but not the other way around. In other words, if Ensembl
makes 2 or more releases based on the same reference genome, our
current naming scheme won't differentiate the 2 TxDb packages.
Wouldn't it be better if we had something like:

   TxDb.Hsapiens.BioMart.ensembl.63
   TxDb.Hsapiens.BioMart.ensembl.64

Anyway, back to your problem. Yes in your case the technical mode
doesn't really matter so it's really up to you. Maybe being explicit
about the reference genome (with *.UCSC.hg19.*) is more important
than the technical mode?

H.

>
> Michael
>
> 2011/11/3 Hervé Pagès <hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>
>     Hi Michael,
>
>
>     On 11-11-02 08:58 PM, Michael Lawrence wrote:
>
>         What are the precise meanings of the tokens in the TxDb package
>         names. In
>         particular, is "UCSC" the genome provider or the annotation
>         provider? In
>         the official packages, those are one in the same, but if someone
>         wanted to
>         make a package for custom annotations on a UCSC genome?
>
>
>     The pkg name is generated automatically by internal helper function
>     GenomicFeatures:::.__makePackageName(). This function extracts all the
>     tokens from the txdb's metadata table. It looks like the 3rd token
>     in the pkg name is extracted from the 'Data source' field and can only
>     be "UCSC" or "BioMart", typically indicating whether the txdb was made
>     with makeTranscriptDbFromUCSC() or makeTranscriptDbFromBiomart().
>     The first function downloads annotations from the UCSC genome
>     browser using rtracklayer. The 2nd one downloads them with biomaRt
>     from whatever mart/dataset was specified.
>
>     For your custom annotations, the final name of the pkg will depend on
>     what GenomicFeatures:::.__makePackageName() finds in the metadata
>     table of your txdb, but, if 'Data source' is not "UCSC" or "BioMart",
>     it seems that GenomicFeatures:::.__makePackageName() will fail (and not
>     in a very informative way I'm afraid). If I understand correctly, you
>     are making your custom txdb object with a call to makeTranscriptDb()?
>     If that's the case, make sure you provide enough information
>     thru its 'metadata' argument. Maybe you could set 'Data source' to
>     "UCSC" and use some kind of custom name for the table (which in your
>     case is probably not a real UCSC "table"). This custom name will become
>     the last token in the package name. So you would end up with something
>     like:
>
>       TxDb.Hsapiens.UCSC.hg19.__GenentechGenes
>
>     This solution would have the advantage of having
>     GenomicFeatures:::.__makePackageName() work out-of-the-box.
>     But maybe it's confusing because it suggests that
>     the txdb was made with makeTranscriptDbFromUCSC()? I hope
>     it's not.
>
>     H.
>
>
>         Thanks,
>         Michael
>
>                 [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list