[BioC] defining accessors (cols, keys, keytypes, select) for annotation package built with AnnotationForge package
Sashi
schalla at indiana.edu
Tue Aug 20 09:40:59 CEST 2013
Marc Carlson <mcarlson at ...> writes:
>
> On 08/14/2013 04:58 AM, Sashi wrote:
> > Marc Carlson <mcarlson <at> ...> writes:
> >
> >> Hi Sashi,
> >>
> >> The PDF from Gabor that you are looking at is much older and was from
> >> before we even had the select method. These days you probably don't
> >> want to do that. Especially if you want to implement a method like
> >> select(). I strongly suspect that you really just want to be looking
at
> >> this vignette instead:
> >>
> >>
> >
http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/
inst/doc/MakingNewAnnotationPackages.pdf
> >> To answer your questions, GO is actually looking at a view that is
> >> created in the database of the three GO tables (one for BP, MF and CC).
> >> But you probably don't need that level of detail. If you are using
> >> org.At.tair.db to look at arabidopsis, then you may already have
> >> everything you need. And if you need another organism, you probably
> >> want to look 1st at making an org package using
> >> makeOrgPackageFromNCBI(). And if for some reason you want to expose
> >> some entirely new database resource (IOW you don't want to make an
> >> organism package but something else entirely), then you might need to
> >> use the vignette above.
> >>
> >> I hope this helps you,
> >>
> >> Marc
> >>
> >> On 08/13/2013 04:33 AM, Rameswara Sashi Kiran Challa wrote:
> >>> Hi ,
> >>>
> >>> I am trying to build an annotation organism package by using
Annotation
> >>> Forge package. I followed this
> >>>
> >
document<http://www.bioconductor.org/packages/2.12/bioc/vignettes/Annotation
Forge/inst/doc/NewSchema.pdf>written
> >>> by Gabor Csardi.
> >>> I was able to build a sqlite database and create an Annotation package
> >>> using the makeAnnDbPkg() function.
> >>>
> >>> I understand cols(), keys(), keytypes() and select() are set as
generic
> >>> methods in AnnotationDbi.
> >>>
> >>> When I look into methods-AnnotationDb.R script in AnnotationDbi
package, I
> >>> see cols() method is set and it actually reads all the columns of all
the
> >>> tables in the sqlite table.
> >>>
> >>> When I run *cols() *on *org.At.tair.db *I get few values which are
> >>> actually not field/column names in the sqlite db. For Eg. there is no
table
> >>> called "GO" in org.At.tair.sqlite database. I am unable to understand
how
> >>> it creates these values. Could someone please help me understand how
and
> >>> where exactly these accessor functions are defined and how and where
are
> >>> they to be modified to be able to access the data in the sqlite db
that I
> >>> am creating for the organism I am working on.
> >>>
> >>> ==========================
> >>>
> >>>> cols(org.At.tair.db)
> >>> [1] "TAIR" "CHRLOC" "CHRLOCEND" "ENZYME"
"PATH"
> >>>
> >>>
> >>> [6] "PMID" "REFSEQ" "SYMBOL" "GENENAME" "GO"
> >>>
> >>>
> >>> [11] "EVIDENCE" "ONTOLOGY" "GOALL" "EVIDENCEALL"
> > "ONTOLOGYALL"
> >>> [16] "ARACYC" "ARACYCENZYME" "ENTREZID" "CHR"
> >>> =======================================
> >>>
> >>> Please point me to any documentation available for the same.
> >>>
> >>> Thanks for your time,
> >>> Sashi
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor <at> ...
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor <at> ...
> > Hi Marc,
> >
> > Thanks for your prompt reply. Referring to the document you pointed me
to, I
> > created another R script within the organism package skeleton( an R
script
> > apart from zzz.R) and set cols, keytypes accessor methods.
> >
> > As part of annotation packages Bimaps are created in every annotation
> > package. How do we use these Bimaps in these accessor methods? Am I
right in
> > thinking that these Bimaps are to be used in these accessor methods? Or
> > those Bimaps have to be accessed only via get(), mget(), toTable()
methods?
> >
> > Also, can you please let me know if there is any documentation available
on
> > how the GO views are created? I see there are seperate tables like
go_cc,
> > go_mf, go_bp, etc under Arabidopsis annotation package. Is it necessary
to
> > have go_cc, go_mf, go_bp, go_mf_all, like tables in the sqlite database
for
> > the customized annotation package I am creating? Will not just a single
> > table for all GO annotations suffice?
> >
> > Thanks again for your time,
> > Sashi
> Hi Sashi,
>
> I really doubt that you need to think about bimaps at all. You don't
> need them to implement select, cols, keytypes or keys. And they are
> really only still supported for the sake of older legacy code. The get,
> mget, and toTable methods are defined to help with bimaps, but you
> probably don't need to use these methods anyways. So it's very unlikely
> that you would even need to use bimaps let alone implement them.
>
> And the go view is just a SQLite database view. A view is sort of like
> a pre-canned database query that appears as a table. Our "go view" is
> really just the union of go_bp, go_mf, and go_cc tables. Those three
> separate tables allow us to still keep the different terms (from the
> different ontologies) as separate from each other in the database. But
> since we are using a view, we can also easily query all three of them
> (as if they were lumped together) WITHOUT actually duplicating all that
> data into another enormous table. And the performance for this is still
> great.
>
> You can read a bit about how SQLITE views are created here if you are
> curious:
>
> http://www.sqlite.org/lang_createview.html
>
> But if you are making an org package, why not just use
> makeOrgPackageFromNCBI?
>
> Marc
>
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at ...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Thanks a lot Marc!!
It's good to know that BioConductor community is trying to move away from
Bimaps and adopt cols, select, keys, keytypes methods for any sort of
queries. Are Reverse maps that are part of Bimaps taken care by these
accesors?
As I understand, for older legacy code perhaps the makeOrgPackageFromNCBI is
also still generating Bimaps and in near future, perhaps all the Annotation
packages will just have a sqlite database and these accessors, defined. Am I
correct?
I had started by looking at how to build a sqlite db with some of the
mappings we have and had not used makeOrgPackageFromNCBI function. My
thinking was that having an understanding of sqlite db building will enable
me to add any new mappings that are not part of NCBI.
So, to summarize, for Annotation package development one approach is using
makeOrgPackageFromNCBI() and the other approach is to make a sqlite db and
then define these accessors, as given in the pdf you had linked me to
earlier. And there will be no need of any Bimaps for the package development
as such.
Thanks for your time,
-Sashi
More information about the Bioconductor
mailing list