[BioC] defining accessors (cols, keys, keytypes, select) for annotation package built with AnnotationForge package

Tue Aug 20 09:40:59 CEST 2013

Marc Carlson <mcarlson at ...> writes:

> 
> On 08/14/2013 04:58 AM, Sashi wrote:
> > Marc Carlson <mcarlson <at> ...> writes:
> >
> >> Hi Sashi,
> >>
> >> The PDF from Gabor that you are looking at is much older and was from
> >> before we even had the select method.  These days you probably don't
> >> want to do that.  Especially if you want to implement a method like
> >> select().  I strongly suspect that you really just want to be looking 
at
> >> this vignette instead:
> >>
> >>
> > 
http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/
inst/doc/MakingNewAnnotationPackages.pdf
> >> To answer your questions, GO is actually looking at a view that is
> >> created in the database of the three GO tables (one for BP, MF and CC).
> >> But you probably don't need that level of detail.  If you are using
> >> org.At.tair.db to look at arabidopsis, then you may already have
> >> everything you need.  And if you need another organism, you probably
> >> want to look 1st at making an org package using
> >> makeOrgPackageFromNCBI().  And if for some reason you want to expose
> >> some entirely new database resource (IOW you don't want to make an
> >> organism package but something else entirely), then you might need to
> >> use the vignette above.
> >>
> >> I hope this helps you,
> >>
> >>     Marc
> >>
> >> On 08/13/2013 04:33 AM, Rameswara Sashi Kiran Challa wrote:
> >>> Hi ,
> >>>
> >>> I am trying to build an annotation organism package by using 
Annotation
> >>> Forge package. I followed this
> >>>
> > 
document<http://www.bioconductor.org/packages/2.12/bioc/vignettes/Annotation
Forge/inst/doc/NewSchema.pdf>written
> >>> by Gabor Csardi.
> >>> I was able to build a sqlite database and create an Annotation package
> >>> using the makeAnnDbPkg() function.
> >>>
> >>> I understand cols(), keys(), keytypes() and select() are set as 
generic
> >>> methods in AnnotationDbi.
> >>>
> >>> When I look into methods-AnnotationDb.R script in AnnotationDbi 
package, I
> >>> see cols() method is set and it actually reads all the columns of all 
the
> >>> tables in the sqlite table.
> >>>
> >>> When I run *cols() *on *org.At.tair.db  *I get few values which are
> >>> actually not field/column names in the sqlite db. For Eg. there is no 
table
> >>> called "GO" in org.At.tair.sqlite database. I am unable to understand 
how
> >>> it creates these values. Could someone please help me understand how 
and
> >>> where exactly these accessor functions are defined and how and where 
are
> >>> they to be modified to be able to access the data in the sqlite db 
that I
> >>> am creating for the organism I am working on.
> >>>
> >>> ==========================
> >>>
> >>>> cols(org.At.tair.db)
> >>>    [1] "TAIR"         "CHRLOC"       "CHRLOCEND"    "ENZYME"       
"PATH"
> >>>
> >>>
> >>> [6] "PMID"         "REFSEQ"       "SYMBOL"       "GENENAME"     "GO"
> >>>
> >>>
> >>> [11] "EVIDENCE"     "ONTOLOGY"     "GOALL"        "EVIDENCEALL"
> > "ONTOLOGYALL"
> >>> [16] "ARACYC"       "ARACYCENZYME" "ENTREZID"     "CHR"
> >>> =======================================
> >>>
> >>> Please point me to any documentation available for the same.
> >>>
> >>> Thanks for your time,
> >>> Sashi
> >>>
> >>> 	[[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor <at> ...
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor <at> ...
> > Hi Marc,
> >
> > Thanks for your prompt reply. Referring to the document you pointed me 
to, I
> > created another R script within the organism package skeleton( an R 
script
> > apart from zzz.R) and set cols, keytypes accessor methods.
> >
> > As part of annotation packages Bimaps are created in every annotation
> > package. How do we use these Bimaps in these accessor methods? Am I 
right in
> > thinking that these Bimaps are to be used in these accessor methods? Or
> > those Bimaps have to be accessed only via get(), mget(), toTable() 
methods?
> >
> > Also, can you please let me know if there is any documentation available 
on
> > how the GO views are created? I see there are seperate tables like 
go_cc,
> > go_mf, go_bp, etc under Arabidopsis annotation package. Is it necessary 
to
> > have go_cc, go_mf, go_bp, go_mf_all, like tables in the sqlite database 
for
> > the customized annotation package I am creating? Will not just a single
> > table for all GO annotations suffice?
> >
> > Thanks again for your time,
> > Sashi
> Hi Sashi,
> 
> I really doubt that you need to think about bimaps at all.  You don't 
> need them to implement select, cols, keytypes or keys.  And they are 
> really only still supported for the sake of older legacy code.  The get, 
> mget, and toTable methods are defined to help with bimaps, but you 
> probably don't need to use these methods anyways. So it's very unlikely 
> that you would even need to use bimaps let alone implement them.
> 
> And the go view is just a SQLite database view.  A view is sort of like 
> a pre-canned database query that appears as a table.  Our "go view" is 
> really just the union of go_bp, go_mf, and go_cc tables. Those three 
> separate tables allow us to still keep the different terms (from the 
> different ontologies) as separate from each other in the database.  But 
> since we are using a view, we can also easily query all three of them 
> (as if they were lumped together) WITHOUT actually duplicating all that 
> data into another enormous table.  And the performance for this is still 
> great.
> 
> You can read a bit about how SQLITE views are created here if you are 
> curious:
> 
> http://www.sqlite.org/lang_createview.html
> 
> But if you are making an org package, why not just use 
> makeOrgPackageFromNCBI?
> 
>    Marc
> 
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at ...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

Thanks a lot Marc!!

It's good to know that BioConductor community is trying to move away from 
Bimaps and adopt cols, select, keys, keytypes methods for any sort of 
queries. Are Reverse maps that are part of Bimaps taken care by these 
accesors?

As I understand, for older legacy code perhaps the makeOrgPackageFromNCBI is 
also still generating Bimaps and in near future, perhaps all the Annotation 
packages will just have a sqlite database and these accessors, defined. Am I 
correct? 

I had started by looking at how to build a sqlite db with some of the 
mappings we have and had not used makeOrgPackageFromNCBI function. My 
thinking was that having an understanding of sqlite db building will enable 
me to add any new mappings that are not part of NCBI.

So, to summarize, for Annotation package development one approach is using 
makeOrgPackageFromNCBI() and the other approach is to make a sqlite db and 
then define these accessors, as given in the pdf you had linked me to 
earlier. And there will be no need of any Bimaps for the package development 
as such. 

Thanks for your time,
-Sashi