[Bioc-devel] Purpose of subsetted GO tables in OrgDb packages

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Tue May 12 17:40:51 CEST 2020


just to acknowledge that I see my name here, and am working to remember why the change was introduced -- it wasn't arbitrary, but I don't remember the full context.

Will report back ASAP.

Martin

On 5/11/20, 5:59 PM, "Bioc-devel on behalf of James W. MacDonald" <bioc-devel-bounces using r-project.org on behalf of jmacdon using uw.edu> wrote:

    There is a bug in the way that the OrgDb packages that use the NOSCHEMA
    schema figure out which tables to use, that was introduced by some changes
    that Martin made to AnnotationForge last September:

    commit 02749e3779eb5036211d600915506bab86633ea0
    Author: Martin Morgan <martin.morgan using roswellpark.org>
    Date:   Fri Sep 27 12:18:48 2019 -0400

        support go_cc, go_cc_all, etc when making OrgDb from data.frame()s

    Which can be shown by doing

    library(AnnotationForge)
    example(makeOrgPackage)
    library(org.Tgutatta.eg.db)
    select(org.Tguttata.eg.db, head(keys(org.Tguttata.eg.db)), "GO")
    Error in FUN(X[[i]], ...) :
      Two fields in the source DB have the same name.

    This comes from the internal function  .deriveTableNameFromField, which
    tries to infer the correct tables to use for the SQL query. Since there are
    now lots of tables with 'GO' as one of their field names:

    > con <- org.Tguttata.eg_dbconn()
    > z <- grep("go", dbListTables(con), value = TRUE)
    > sapply(z, dbListFields, con = con)
    $go
    [1] "_id"      "GO"       "EVIDENCE" "ONTOLOGY"

    $go_all
    [1] "_id"         "GOALL"       "EVIDENCEALL" "ONTOLOGYALL"

    $go_bp
    [1] "_id"      "GO"       "EVIDENCE"

    $go_bp_all
    [1] "_id"      "GO"       "EVIDENCE"

    $go_cc
    [1] "_id"      "GO"       "EVIDENCE"

    $go_cc_all
    [1] "_id"      "GO"       "EVIDENCE"

    $go_mf
    [1] "_id"      "GO"       "EVIDENCE"

    $go_mf_all
    [1] "_id"      "GO"       "EVIDENCE"

    It's no longer possible to figure out which table to use when a user wants
    data from the 'GO' column. In the past this wasn't a problem because there
    were just two tables (go and go_all) for NOSCHEMA OrgDbs, so it would pick
    the go table.

    An easy fix would be to subset out any but the go and go_all table as part
    of .deriveTableNameFromField, but then it seems weird to even have these
    other tables.  Which made me wonder if there are any instances where
    anything but the go or go_all tables are used, but I can't find one, which
    makes me wonder why we even have these other tables? So maybe the real easy
    fix is to just back out the changes that Martin made, and maybe even remove
    the subsetted GO tables from the DBSCHEMA packages as well?

    Best,

    Jim

    -- 
    James W. MacDonald, M.S.
    Biostatistician
    University of Washington
    Environmental and Occupational Health Sciences
    4225 Roosevelt Way NE, # 100
    Seattle WA 98105-6099

    	[[alternative HTML version deleted]]

    _______________________________________________
    Bioc-devel using r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel


More information about the Bioc-devel mailing list