[Bioc-devel] Purpose of subsetted GO tables in OrgDb packages

James W. MacDonald jm@cdon @end|ng |rom uw@edu
Mon May 11 23:57:41 CEST 2020


There is a bug in the way that the OrgDb packages that use the NOSCHEMA
schema figure out which tables to use, that was introduced by some changes
that Martin made to AnnotationForge last September:

commit 02749e3779eb5036211d600915506bab86633ea0
Author: Martin Morgan <martin.morgan using roswellpark.org>
Date:   Fri Sep 27 12:18:48 2019 -0400

    support go_cc, go_cc_all, etc when making OrgDb from data.frame()s

Which can be shown by doing

library(AnnotationForge)
example(makeOrgPackage)
library(org.Tgutatta.eg.db)
select(org.Tguttata.eg.db, head(keys(org.Tguttata.eg.db)), "GO")
Error in FUN(X[[i]], ...) :
  Two fields in the source DB have the same name.

This comes from the internal function  .deriveTableNameFromField, which
tries to infer the correct tables to use for the SQL query. Since there are
now lots of tables with 'GO' as one of their field names:

> con <- org.Tguttata.eg_dbconn()
> z <- grep("go", dbListTables(con), value = TRUE)
> sapply(z, dbListFields, con = con)
$go
[1] "_id"      "GO"       "EVIDENCE" "ONTOLOGY"

$go_all
[1] "_id"         "GOALL"       "EVIDENCEALL" "ONTOLOGYALL"

$go_bp
[1] "_id"      "GO"       "EVIDENCE"

$go_bp_all
[1] "_id"      "GO"       "EVIDENCE"

$go_cc
[1] "_id"      "GO"       "EVIDENCE"

$go_cc_all
[1] "_id"      "GO"       "EVIDENCE"

$go_mf
[1] "_id"      "GO"       "EVIDENCE"

$go_mf_all
[1] "_id"      "GO"       "EVIDENCE"

It's no longer possible to figure out which table to use when a user wants
data from the 'GO' column. In the past this wasn't a problem because there
were just two tables (go and go_all) for NOSCHEMA OrgDbs, so it would pick
the go table.

An easy fix would be to subset out any but the go and go_all table as part
of .deriveTableNameFromField, but then it seems weird to even have these
other tables.  Which made me wonder if there are any instances where
anything but the go or go_all tables are used, but I can't find one, which
makes me wonder why we even have these other tables? So maybe the real easy
fix is to just back out the changes that Martin made, and maybe even remove
the subsetted GO tables from the DBSCHEMA packages as well?

Best,

Jim

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list