[Bioc-devel] naming of TxDb packages

Hervé Pagès hpages at fhcrc.org
Mon Nov 7 07:38:56 CET 2011


On 11-11-04 08:28 PM, Michael Lawrence wrote:
> Right, there are two things that need to be described: the annotation
> track and the reference genome. The annotation track, if given a
> provider, version and name, always (well at least in my experience)
> implies a particular genome provider/version.
>
> So:
>
> TxDb.Hsapiens.myMart.64.ccds
>
> Might work?

It probably does.

H.

>
> 2011/11/4 Hervé Pagès <hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>
>     Hi Michael,
>
>
>     On 11-11-03 06:36 PM, Michael Lawrence wrote:
>
>         We're actually using a patched version of
>         makeTranscriptDbFromBiomart to
>         get models out of an internal biomart.  Patch is on its way to Marc.
>
>         So it would be like: TxDb.Hsapiens.BioMart.hg19.__gneGenes?
>
>
>     This suggests that you have a Mart called "hg19" (see below why).
>
>
>
>         Seems weird to mix the technical mode of data retrieval into the
>         name.
>
>
>     The naming scheme when 'Data source' is "BioMart" seems to be a little
>     bit different. For example, if I use makeTranscriptDbFromBiomart() with
>     biomart="ensembl" and dataset="hsapiens_gene___ensembl", then I get:
>
>      > GenomicFeatures:::.__makePackageName(txdb)
>       [1] "TxDb.Hsapiens.BioMart.__ensembl.GRCh37.p5"
>
>     Token #4 ("ensembl") is the name of the Mart. I'm a little bit
>     surprised with token #5 though. I would have expected it to be
>     the ensembl version (eventually followed by the reference genome)
>     because one can always infer the reference genome from the ensembl
>     version but not the other way around. In other words, if Ensembl
>     makes 2 or more releases based on the same reference genome, our
>     current naming scheme won't differentiate the 2 TxDb packages.
>     Wouldn't it be better if we had something like:
>
>       TxDb.Hsapiens.BioMart.ensembl.__63
>       TxDb.Hsapiens.BioMart.ensembl.__64
>
>     Anyway, back to your problem. Yes in your case the technical mode
>     doesn't really matter so it's really up to you. Maybe being explicit
>     about the reference genome (with *.UCSC.hg19.*) is more important
>     than the technical mode?
>
>     H.
>
>
>         Michael
>
>         2011/11/3 Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>>
>
>
>             Hi Michael,
>
>
>             On 11-11-02 08:58 PM, Michael Lawrence wrote:
>
>                 What are the precise meanings of the tokens in the TxDb
>         package
>                 names. In
>                 particular, is "UCSC" the genome provider or the annotation
>                 provider? In
>                 the official packages, those are one in the same, but if
>         someone
>                 wanted to
>                 make a package for custom annotations on a UCSC genome?
>
>
>             The pkg name is generated automatically by internal helper
>         function
>             GenomicFeatures:::.____makePackageName(). This function
>         extracts all the
>
>             tokens from the txdb's metadata table. It looks like the 3rd
>         token
>             in the pkg name is extracted from the 'Data source' field
>         and can only
>             be "UCSC" or "BioMart", typically indicating whether the
>         txdb was made
>             with makeTranscriptDbFromUCSC() or
>         makeTranscriptDbFromBiomart().
>             The first function downloads annotations from the UCSC genome
>             browser using rtracklayer. The 2nd one downloads them with
>         biomaRt
>             from whatever mart/dataset was specified.
>
>             For your custom annotations, the final name of the pkg will
>         depend on
>             what GenomicFeatures:::.____makePackageName() finds in the
>         metadata
>
>             table of your txdb, but, if 'Data source' is not "UCSC" or
>         "BioMart",
>             it seems that GenomicFeatures:::.____makePackageName() will
>         fail (and not
>
>             in a very informative way I'm afraid). If I understand
>         correctly, you
>             are making your custom txdb object with a call to
>         makeTranscriptDb()?
>             If that's the case, make sure you provide enough information
>             thru its 'metadata' argument. Maybe you could set 'Data
>         source' to
>         "UCSC" and use some kind of custom name for the table (which in your
>             case is probably not a real UCSC "table"). This custom name
>         will become
>             the last token in the package name. So you would end up with
>         something
>             like:
>
>               TxDb.Hsapiens.UCSC.hg19.____GenentechGenes
>
>
>             This solution would have the advantage of having
>             GenomicFeatures:::.____makePackageName() work out-of-the-box.
>
>             But maybe it's confusing because it suggests that
>             the txdb was made with makeTranscriptDbFromUCSC()? I hope
>             it's not.
>
>             H.
>
>
>                 Thanks,
>                 Michael
>
>                         [[alternative HTML version deleted]]
>
>                 ___________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                 mailing list
>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>             --
>             Hervé Pagès
>
>             Program in Computational Biology
>             Division of Public Health Sciences
>             Fred Hutchinson Cancer Research Center
>             1100 Fairview Ave. N, M1-B514
>             P.O. Box 19024
>             Seattle, WA 98109-1024
>
>             E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>             Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>             Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list