[BioC] GenomicFeatures: makeGeneDbFromBiomart()

Marc Carlson mcarlson at fhcrc.org
Sat Mar 5 00:00:59 CET 2011


Hi Guido,

If you really want to go down the road of making a custom database to
store your biomaRt annotations, then I think you will find that it is
easy enough to do and can be pretty rewarding.  You might find it
helpful to see my slides from the "Using Databaases in R" talk at our
most recent course (there are some exercises there as well).

http://www.bioconductor.org/help/course-materials/2011/AdvancedRFeb2011Seattle/

Alternatively (depending on how much data you have to store and how
organized you need it to be), you may just want to save your annotations
in a data.frame as a local .Rda file?  It all depends on your use case,
whether or not a database is really called for or not.


  Marc



On 03/03/2011 05:34 AM, Hooiveld, Guido wrote:
> Hi Marc,
>
> Thank you for your suggestion. However, the combination of makeTranscriptDb + org.xx.eg.db packages won't work in all cases.
> As you likely will know, a substantial part of our array analyses is performed with models that are not- or less-well studied in biomedical research, e.g. pig or a variety of plants (medicago, tomato). As a consequence, the annotation efforts are much less well thorough and standardized compared to e.g. human, mouse or rat, and in turn the BioC annotation infrastructure for these less-standard species is (understandably) less well developped. 
> Taking pig as an example, although an org.db package is available (org.Ss.eg.db; build Sept 2010), this doesn't (yet?) contain Ensembl-based gene information. Moreover, until very recently (end of Dec 2010) it was Ensembl that had considerably more gene annotation info on the pig genome available than NCBI. I was hoping that by having such makeGeneDbFromBiomart() function available it could save me the hassle of always going through the process of manually querying the biomart website, because a BioC-compliant, Ensembl gene-centered database could be created (and saved!).
>
> For plants basically the situation is even 'worse', by this i mean that in the case there is annotation info available, it is often limited and in such a format it is impossible for me to easily access it in BioC. I noticed the low level function makeTranscriptDb is able to create a db object from text files, hence ideal for my purpose, except that is transcript-centered. Often only gene-centered annotation info is available for plants, and then I expect I run into problems since e.g. info on splicing (required for dataframe 'splicings') is lacking.
>
> I hope you got the reasoning for my question.
>
> Regards,
> Guido
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Marc Carlson
> Sent: Thursday, March 03, 2011 01:16
> To: bioconductor at r-project.org
> Subject: Re: [BioC] GenomicFeatures: makeGeneDbFromBiomart()
>
> Hi Guido,
>
> If you just want gene information, then the
> makeTranscriptDbFromBiomart() function should already give you gene IDs affiliated with the transcripts along with grouping information in convenient GRangesList objects.  Yes, this database is focused on the transcripts and their components, but it is not meant to be isolated from proper gene IDs.
>
> And if you want to then link that information to more classic gene-centric annotations then you might want to look at something like the org.Hs.eg.db package (which includes IDs for ensembl IDs).
>
> Using these two resources together, our hope was that it should be possible to do a large number of meaningful things.  So what specifically was it that you needed to do?
>
>
>   Marc
>
>
> On 03/01/2011 02:56 AM, Hooiveld, Guido wrote:
>   
>> I noticed that the library GenomicFeatures provides a set of very powerful functions to create databases with transcript-centered annotations from e.g. the BioMart database (makeTranscriptDbFromBiomart).
>> I was wondering whether a function could be added that will allow the build of a gene-centered annotation database? E.g: 'makeGeneDbFromBiomart()' and/or 'makeGeneDb'.
>> I am asking because I would like to easily retrieve AND store the annotation info of all Ensembl mouse genes. I already had a look at the source code to see whether I could modify some parts of the code myselves to create such function, but to me the code is too complicated to feel comfortable adapting it, but i have the *feeling* that this is rather straight-forward for the more knowledgable R-gurus, hence my question.
>>
>> Thanks in advance for considering,
>> Guido
>>
>> ------------------------------------------------
>> Guido Hooiveld, PhD
>> Nutrition, Metabolism & Genomics Group Division of Human Nutrition 
>> Wageningen University Biotechnion, Bomenweg 2
>> NL-6703 HD Wageningen
>> the Netherlands
>> tel: (+)31 317 485788
>> fax: (+)31 317 483342
>> email:      guido.hooiveld at wur.nl<mailto:guido.hooiveld at wur.nl>
>> internet:   http://nutrigene.4t.com<http://nutrigene.4t.com/>
>> http://www.researcherid.com/rid/F-4912-2010
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>     
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>



More information about the Bioconductor mailing list