[Bioc-devel] naming of TxDb packages
Hervé Pagès
hpages at fhcrc.org
Mon Nov 7 07:38:56 CET 2011
On 11-11-04 08:28 PM, Michael Lawrence wrote:
> Right, there are two things that need to be described: the annotation
> track and the reference genome. The annotation track, if given a
> provider, version and name, always (well at least in my experience)
> implies a particular genome provider/version.
>
> So:
>
> TxDb.Hsapiens.myMart.64.ccds
>
> Might work?
It probably does.
H.
>
> 2011/11/4 Hervé Pagès <hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>
> Hi Michael,
>
>
> On 11-11-03 06:36 PM, Michael Lawrence wrote:
>
> We're actually using a patched version of
> makeTranscriptDbFromBiomart to
> get models out of an internal biomart. Patch is on its way to Marc.
>
> So it would be like: TxDb.Hsapiens.BioMart.hg19.__gneGenes?
>
>
> This suggests that you have a Mart called "hg19" (see below why).
>
>
>
> Seems weird to mix the technical mode of data retrieval into the
> name.
>
>
> The naming scheme when 'Data source' is "BioMart" seems to be a little
> bit different. For example, if I use makeTranscriptDbFromBiomart() with
> biomart="ensembl" and dataset="hsapiens_gene___ensembl", then I get:
>
> > GenomicFeatures:::.__makePackageName(txdb)
> [1] "TxDb.Hsapiens.BioMart.__ensembl.GRCh37.p5"
>
> Token #4 ("ensembl") is the name of the Mart. I'm a little bit
> surprised with token #5 though. I would have expected it to be
> the ensembl version (eventually followed by the reference genome)
> because one can always infer the reference genome from the ensembl
> version but not the other way around. In other words, if Ensembl
> makes 2 or more releases based on the same reference genome, our
> current naming scheme won't differentiate the 2 TxDb packages.
> Wouldn't it be better if we had something like:
>
> TxDb.Hsapiens.BioMart.ensembl.__63
> TxDb.Hsapiens.BioMart.ensembl.__64
>
> Anyway, back to your problem. Yes in your case the technical mode
> doesn't really matter so it's really up to you. Maybe being explicit
> about the reference genome (with *.UCSC.hg19.*) is more important
> than the technical mode?
>
> H.
>
>
> Michael
>
> 2011/11/3 Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
>
>
> Hi Michael,
>
>
> On 11-11-02 08:58 PM, Michael Lawrence wrote:
>
> What are the precise meanings of the tokens in the TxDb
> package
> names. In
> particular, is "UCSC" the genome provider or the annotation
> provider? In
> the official packages, those are one in the same, but if
> someone
> wanted to
> make a package for custom annotations on a UCSC genome?
>
>
> The pkg name is generated automatically by internal helper
> function
> GenomicFeatures:::.____makePackageName(). This function
> extracts all the
>
> tokens from the txdb's metadata table. It looks like the 3rd
> token
> in the pkg name is extracted from the 'Data source' field
> and can only
> be "UCSC" or "BioMart", typically indicating whether the
> txdb was made
> with makeTranscriptDbFromUCSC() or
> makeTranscriptDbFromBiomart().
> The first function downloads annotations from the UCSC genome
> browser using rtracklayer. The 2nd one downloads them with
> biomaRt
> from whatever mart/dataset was specified.
>
> For your custom annotations, the final name of the pkg will
> depend on
> what GenomicFeatures:::.____makePackageName() finds in the
> metadata
>
> table of your txdb, but, if 'Data source' is not "UCSC" or
> "BioMart",
> it seems that GenomicFeatures:::.____makePackageName() will
> fail (and not
>
> in a very informative way I'm afraid). If I understand
> correctly, you
> are making your custom txdb object with a call to
> makeTranscriptDb()?
> If that's the case, make sure you provide enough information
> thru its 'metadata' argument. Maybe you could set 'Data
> source' to
> "UCSC" and use some kind of custom name for the table (which in your
> case is probably not a real UCSC "table"). This custom name
> will become
> the last token in the package name. So you would end up with
> something
> like:
>
> TxDb.Hsapiens.UCSC.hg19.____GenentechGenes
>
>
> This solution would have the advantage of having
> GenomicFeatures:::.____makePackageName() work out-of-the-box.
>
> But maybe it's confusing because it suggests that
> the txdb was made with makeTranscriptDbFromUCSC()? I hope
> it's not.
>
> H.
>
>
> Thanks,
> Michael
>
> [[alternative HTML version deleted]]
>
> ___________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>
> mailing list
> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list