[Bioc-devel] Problem with subset("AnnDbBimap", ...) ?

Laurent Gautier lgautier at gmail.com
Sat Feb 7 15:13:40 CET 2009


Hi Hervé,

Thanks for for detailled answer.

I understand that the choice was made to forbid the addition of 
[L|R]keys to a Bimap (which you call "junk keys"). I suspect that
implementation concerns weighted in the decision, but that's for a
separate thread.

Unless the message is that the environment-like API is not around only 
for back-compatibility reasons, the main (consistency) problem I am 
having with subset("AnnDbBimap", ...) is  still present after reading 
your explanations, although it might be coming from the use of the 
function subset in R's base working on data.frame.

I'll illustrate it with an example:

> subset(CO2, Treatment == "unfair")
[1] Plant     Type      Treatment conc      uptake
<0 rows> (or 0-length row.names)

The function subset("data.frame", ...) is then no less endomorphic than
subset("AnnDbBimap", ...), yet it returns an empty data.frame rather 
than raise an error such as 'no "unfair" Treatment'.

In its current instance, the function subset("AnnDbBimap", ...) might be 
pushing complexity toward the user for use-cases such as:
"I have a list of arbitrary gene symbols, and I'd like to get the
probes/probesets associated with those".

The current (as of today) implementation for subset("Bimap", ...) is:

setMethod("subset", "Bimap",
     function(x, Lkeys=NULL, Rkeys=NULL)
     {
         Lkeys(x) <- Lkeys
         Rkeys(x) <- Rkeys
         x
     }
)

while it could be like:

setMethod("subset", "Bimap",
     function(x, Lkeys=NULL, Rkeys=NULL, quiet=FALSE)
     {
         if (quiet) {
             Lkeys(x) <- Lkeys[Lkeys %in% Lkeys(x)]
             Rkeys(x) <- Rkeys[Rkeys %in% Rkeys(x)]
         } else {
             Lkeys(x) <- Lkeys
             Rkeys(x) <- Rkeys
         }
         x
     }
)


Just a thought,



L.




Hervé Pagès wrote:
> Hi Laurent,
> 
> All this is consistent. One important part of the contract for
> subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms
> i.e. they return an instance of the same class as the original
> object.
> mouse4302SYMBOL is an AnnDbBimap object so any of the functions above
> must return a (valid) AnnDbBimap object.
> The keys of a valid AnnDbBimap object cannot be anything. For example
> if 'x' is a mapping from probeset ids to entrez ids, the left keys
> must be valid probeset ids (for this chip) and the right keys must be
> valid entrez ids.
> What kind of AnnDbBimap object would be returned by
> subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what
> kind of AnnDbBimap object would become 'x' after
>   x <- mouse4302SYMBOL; Rkeys(x) <- "foo".
> It would be an AnnDbBimap object with junk keys but valid
> AnnDbBimap objects don't support this.
> 
> I added these functions when I worked on faking the environment
> interface for SQLite-based annotations. Note that they are not
> part of the environment-like API. They are low-level
> functions that I first wrote and used internally so it would
> be easier for me to build the environment-like API (mget, get,
> ls, etc...). My first intention was not to export them but then
> I realized they had their own added-value so I exported and
> documented them. Since they are not part of the environment-like
> API, I had no constraint of backward compatibility which was nice
> because then I could decide to make them do what I considered the
> right thing. OTOH I had to make the environment-like API ackward
> compatible and that's why you can use junk keys in mget (granted
> that you specify ifnotfound=NA):
> 
>   > mget("foo", mouse4302SYMBOL, ifnotfound=NA)
>   $foo
>   [1] NA
> 
> mget() returns a list, not an AnnDbBimap instance (it's not an
> endomorphism) so it can return a list with anything in it without
> breaking any rule.
> 
> We could add the hasLkey() and hasRkey() but since this would be
> equivalent to "foo" %in% Rkeys(x), I'm not sure they would have
> a lot of added value though. The performance of "foo" %in% Rkeys(x)
> should be good enough, especially the 2nd time you do this on 'x'
> because Rkeys() (like Lkeys() and other low-level functions in
> AnnotationDbi) cache their result (in a hidden environment).
> 
> H.
> 
> 
> Laurent Gautier wrote:
>> Dear list,
>>
>> The function subset("AnnDbBimap", ...) is returning an error whenever 
>> the resulting subset should be the empty set.
>>
>> Example:
>>
>> library(mouse4302.db)
>> subset(mouse4302SYMBOL,
>>        Rkeys="foo")
>>
>>
>> returns:
>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>   value for "foo" not found
>>
>> This is true for either Lkeys or Rkeys.
>>
>>
>> The man page does say "
>> Lkeys
>> The new Lkeys (must be a subset of the current Lkeys).
>>
>> Rkeys
>> The new Rkeys (must be a subset of the current Rkeys).
>> "
>>
>> but this is limiting the use for the function and encourages the use 
>> of the environment-like API, although marked as provided "for backward 
>> compatibility".
>>
>> Wouldn't it be good to either have:
>>
>> - have subset return without an error
>>
>> - have at least functions such as hasLkey and hasRkey ?
>>
>>
>>
>> L.
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list