[Bioc-devel] Problem with subset("AnnDbBimap", ...) ?

Hervé Pagès hpages at fhcrc.org
Fri Feb 6 23:13:59 CET 2009

Hi Laurent,

All this is consistent. One important part of the contract for
subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms
i.e. they return an instance of the same class as the original
mouse4302SYMBOL is an AnnDbBimap object so any of the functions above
must return a (valid) AnnDbBimap object.
The keys of a valid AnnDbBimap object cannot be anything. For example
if 'x' is a mapping from probeset ids to entrez ids, the left keys
must be valid probeset ids (for this chip) and the right keys must be
valid entrez ids.
What kind of AnnDbBimap object would be returned by
subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what
kind of AnnDbBimap object would become 'x' after
   x <- mouse4302SYMBOL; Rkeys(x) <- "foo".
It would be an AnnDbBimap object with junk keys but valid
AnnDbBimap objects don't support this.

I added these functions when I worked on faking the environment
interface for SQLite-based annotations. Note that they are not
part of the environment-like API. They are low-level
functions that I first wrote and used internally so it would
be easier for me to build the environment-like API (mget, get,
ls, etc...). My first intention was not to export them but then
I realized they had their own added-value so I exported and
documented them. Since they are not part of the environment-like
API, I had no constraint of backward compatibility which was nice
because then I could decide to make them do what I considered the
right thing. OTOH I had to make the environment-like API ackward
compatible and that's why you can use junk keys in mget (granted
that you specify ifnotfound=NA):

   > mget("foo", mouse4302SYMBOL, ifnotfound=NA)
   [1] NA

mget() returns a list, not an AnnDbBimap instance (it's not an
endomorphism) so it can return a list with anything in it without
breaking any rule.

We could add the hasLkey() and hasRkey() but since this would be
equivalent to "foo" %in% Rkeys(x), I'm not sure they would have
a lot of added value though. The performance of "foo" %in% Rkeys(x)
should be good enough, especially the 2nd time you do this on 'x'
because Rkeys() (like Lkeys() and other low-level functions in
AnnotationDbi) cache their result (in a hidden environment).


Laurent Gautier wrote:
> Dear list,
> The function subset("AnnDbBimap", ...) is returning an error whenever 
> the resulting subset should be the empty set.
> Example:
> library(mouse4302.db)
> subset(mouse4302SYMBOL,
>        Rkeys="foo")
> returns:
> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>   value for "foo" not found
> This is true for either Lkeys or Rkeys.
> The man page does say "
> Lkeys
> The new Lkeys (must be a subset of the current Lkeys).
> Rkeys
> The new Rkeys (must be a subset of the current Rkeys).
> "
> but this is limiting the use for the function and encourages the use of 
> the environment-like API, although marked as provided "for backward 
> compatibility".
> Wouldn't it be good to either have:
> - have subset return without an error
> - have at least functions such as hasLkey and hasRkey ?
> L.
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list