[Rd] strange behavior in 'inherits' check for loaded S4 object

John Chambers jmc at r-project.org
Sun Jul 31 14:12:15 CEST 2016


(Just returning from the "wilds" of Canada, so not able to comment on the specifics, but ...)

There is a basic point about generic functions that may be related to the "private" class question and my earlier remarks that Martin alluded to.

R (and S4 before it)  allows packages to define methods for a generic function in another package.  Say, for plot() in graphics.

The current model is that the generic plot() remains one function, specifically a generic associated with the graphics package.

Package A might define a method corresponding to one or two classes defined in that package.  When A is loaded, those methods are added to the table for plot() in the current session.

Now suppose a user calls a function, foo(), in package B, and that foo() in turn calls plot().  This is the same plot() function, and in particular will include the methods supplied from package A.

This is regardless of the two packages having any overt connection.  Also, the methods are accepted by the generic function regardless of whether the class is explicitly exported or not.  In this sense, classes cannot be entirely private if methods are defined for a non-local function.  Namespaces are not directly relevant.

Whether this can lead to strange behavior isn't clear, and if so, is it a sign that something undesirable was done in the particular example?  (In Extending R, I suggested a "right to write methods" that  would discourage a package from having methods unless it owned the function or some of the classes.)

R could adopt a different model for generic functions, where a package that defined a method for a non-exported class would create a "local" version of the generic, but that would likely raise some other issues.

But seems like a useful topic for discussion.

John

On Jul 30, 2016, at 11:07 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:

>>>>>> Kevin Ushey <kevinushey at gmail.com>
>>>>>>    on Fri, 29 Jul 2016 11:46:19 -0700 writes:
> 
>> I should add one more item that may be related here --
>> calling 'methods:::.requirePackage' returns a different
>> result based on whether the package namespace is already
>> loaded or not.
> 
>> If the package namespace is not loaded, the package is
>> loaded and attached, and the package environment is
>> returned:
> 
>>> methods:::.requirePackage("digest")
>>    Loading required package: digest <environment:
>> package:digest> attr(,"name") [1] "package:digest"
>> attr(,"path") [1]
>> "/Users/kevin/Library/R/3.3/library/digest"
>>> "digest" %in% loadedNamespaces()
>>    [1] TRUE
>>> "package:digest" %in% search()
>>    [1] TRUE
> 
>> On the other hand, if the package namespace has already
>> been loaded, the package namespace is returned without
>> attaching the package:
> 
>>> requireNamespace("digest")
>>    Loading required namespace: digest
>>> methods:::.requirePackage("digest")
>>    <environment: namespace:digest>
>>> "digest" %in% loadedNamespaces()
>>    [1] TRUE
>>> "package:digest" %in% search()
>>    [1] FALSE
> 
>> This may be intentional, but the behavior seems surprising
>> and could be responsible for the behavior outlined
>> earlier.
> 
> Yes, the behavior you outlined earlier is buggy, and I also have
> seen similar bugous behavior for the case of non-exported
> classes.
> 
> Part of it is historical:  The S4 code was mostly written before
> namespaces were introduced into R;   I vaguely remember John
> Chambers (the principal creator of S4) saying that he did not
> intend the formal classes to be not visible... which in some
> sense only contains the fact that he (or anybody) would not
> think much about hidden objects before they were introduced.
> 
> Still, in the mean time, most of us have seen many cases where
> we wanted to have "private" classes,  and many packages do have
> them, too.... and they "mostly work" ;-)
> 
> In other words, I agree that it would be very desirable to get
> to the bottom of this and fix such problems.
> 
> .requirePackage() is among the parts of the methods package code
> which are quite delicate (and not much documented AFAIK, the hidden
> .requirePackage() function is a good example!).
> 
> Delicate for at least two reasons:
> 
> 1) They are not only used in crucial steps when "bootstrapping"
>   the methods package ('methods' has to define its own S4
>   generics, methods, and classes before the package "exists"),
> 
> 1b) they are also used both when building and installing another
>    'methods'-dependent package.  This could be called
>    "bootstrapping another package".
> 
> 2) they are also much used whenever S4 code (aka
>   'methods'-dependent) packages are loaded and attached,
>   so should be as fast as possible.
>   I think you know that in recent years there have been
>   considerable (and successful!) efforts to speedup package
>   loading time, e.g. speeding up 'library(Matrix)' by about
>   2 factors of 2, and changing such functions should not
>   deteriorate package-load speed noticably.
> 
> Still, it is very desirable to improve / fix the issue:
> After all this musings, I'd currently guess that it would be a
> good idea if  .requirePackage()  would always return the
> *namespace* of the corresponding package unless that does not
> yet exist, as in the the 'bootstrap' situations above.
> ... or we'd add a new argument to .requirePackage() dealing with
> that, or we use two functions:  .requirePackage() needed for
> boostrapping, returning a package envionment, and
> .requireNamespace() used to access class (and generic function!)
> environments.
> 
> Well tested patches are very welcome; as is filing a formal
> bug report with bugzilla (https://bugs.r-project.org/ ;
> (if you have no "account" there, we will have to 
> manually create one, for now, see the 'Note' on
> https://www.r-project.org/bugs.html because of the spammer
> attacks earlier this month ... but I see you, Kevin are
> registered there),
> or
> just continuing your findings here.
> 
> Martin
> 
> 
>> Best, Kevin
> 
>> On Fri, Jul 29, 2016 at 11:37 AM, Kevin Ushey
>> <kevinushey at gmail.com> wrote:
>>> I have a small idea as to what's going on now; at least,
>>> why exporting the class resolves this particular issue.
>>> 
>>> Firstly, when an S4 class is not exported, the associated
>>> '.__C__<class>' object is not made part of the package
>>> environment.  For example, I see:
>>> 
>>>> getAnywhere(".__C__SubMatrix") A single object matching
>>> '.__C__SubMatrix' was found It was found in the following
>>> places namespace:s4inherits with value < ... >
>>> 
>>> Note that the symbol is only discovered in the package
>>> namespace. When the class is exported (e.g. with
>>> 'exportClasses(SubMatrix)' in the NAMESPACE file), it's
>>> found both in 'package:s4inherits' and
>>> 'namespace:s4inherits'.
>>> 
>>> Secondly, when R attempts to resolve the superclasses for
>>> an S3 class, the function 'methods:::.extendsForS3' is
>>> called. Tracing that code eventually gets us here:
>>> 
>>> https://github.com/wch/r-source/blob/trunk/src/library/methods/R/SClasses.R#L255
>>> 
>>> Note that we reach this code path as the S3 class cache
>>> has not been populated yet; ie, this code returns NULL:
>>> 
>>> https://github.com/wch/r-source/blob/trunk/src/library/methods/R/SClasses.R#L238-L240
>>> 
>>> So, the class hierarchy is looked up using this code:
>>> 
>>> if(isTRUE(nzchar(package))) { whereP <-
>>> .requirePackage(package) value <- get0(cname, whereP,
>>> inherits = inherits) }
>>> 
>>> However, because the '.__C__SubMatrix' object is only
>>> made available in the package's namespace, not within the
>>> package environment, it is not resolved, and so lookup
>>> fails. (Presumedly, that lookup is done when initially
>>> building a cache for S3 dispatch?) So, I wonder if that
>>> class lookup should occur within the package's namespace
>>> instead?
>>> 
>>> Thanks for your time, Kevin
>>> 
>>> On Sat, Jun 25, 2016 at 12:46 PM, Kevin Ushey
>>> <kevinushey at gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> (sorry for the wall of text; the issue here appears to
>>>> be rather complicated)
>>>> 
>>>> I'm seeing a somewhat strange case where checking
>>>> whether an S4 object inherits from a parent class
>>>> defined from another package with 'inherits' fails if
>>>> that object is materialized through a call to
>>>> 'load'. That's a mouthful, so I've put together a
>>>> relatively small reproducible example online here:
>>>> 
>>>> https://github.com/kevinushey/s4inherits
>>>> 
>>>> This package, 's4inherits', defines an S4 class,
>>>> 'SubMatrix', that inherits from the 'dsyMatrix' class
>>>> defined in the Matrix package.  After installing the
>>>> package, I run some simple tests:
>>>> 
>>>> $ R -f test-save.R
>>>> 
>>>>> library(s4inherits) data <- SubMatrix(1)
>>>>> 
>>>>> is(data, "SubMatrix")
>>>> [1] TRUE
>>>>> inherits(data, "SubMatrix")
>>>> [1] TRUE
>>>>> 
>>>>> is(data, "dsyMatrix")
>>>> [1] TRUE
>>>>> inherits(data, "dsyMatrix")
>>>> [1] TRUE
>>>>> 
>>>>> save(data, file = "test.RData")
>>>>> 
>>>> 
>>>> All the inheritance checks report as we would expect. I
>>>> check that the inheritance reports are as expected, then
>>>> save that object to 'test.RData'. I then load that data
>>>> file in a new R session and run the same checks:
>>>> 
>>>> $ R -f test-load.R
>>>> 
>>>>> library(methods) load("test.RData")
>>>>> 
>>>>> inherits(data, "SubMatrix")
>>>> Loading required package: s4inherits [1] TRUE
>>>>> is(data, "SubMatrix")
>>>> [1] TRUE
>>>>> 
>>>>> inherits(data, "dsyMatrix")
>>>> [1] FALSE # (??)
>>>>> is(data, "dsyMatrix")
>>>> [1] TRUE
>>>>> 
>>>> 
>>>> Note that R now reports that my loaded object does _not_
>>>> inherit from "dsyMatrix", yet this occurs only when
>>>> checked with 'inherits()' -- 'is' produces the expected
>>>> result.
>>>> 
>>>> I do not see the behavior if I explicitly load / attach
>>>> the 's4inherits' package before loading the associated
>>>> data file; it only occurs if the package namespace is
>>>> loaded in response to loading the data object hosting a
>>>> 'SubMatrix' object.
>>>> 
>>>> More precisely, the package namespace is loaded when the
>>>> promise hosting the data object is evaluated; that
>>>> promise being generated by 'load', if I understand
>>>> correctly. Somehow, evaluation of that promise within
>>>> the call to 'inherits' in this very special case causes
>>>> the unexpected behavior -- ie, if the first thing that
>>>> you do with the loaded object is check its class with
>>>> 'inherits', then you get this unexpected result.
>>>> 
>>>> Even more, this behavior seems to go away if the
>>>> 's4inherits' package explicitly exports the class --
>>>> however, you could imagine this class normally being
>>>> internal to the package, and so it may be undesirable to
>>>> export it.
>>>> 
>>>> I checked a bit into the C code, and IIUC the check here
>>>> looks up the class hierarchy in the R_S4_extends_table
>>>> object defined in 'attrib.c', so it seems like that
>>>> mapping is potentially not getting constructed with the
>>>> full hierarchy (although hopefully someone with more
>>>> knowledge of the S4 internals + interaction with S3 can
>>>> elaborate).
>>>> 
>>>> (FWIW, this was distilled from a case where S3 dispatch
>>>> on a similar loaded S4 object failed, due to failure to
>>>> resolve the class hierarchy for the S3 dispatch
>>>> context.)
>>>> 
>>>> Thanks, Kevin
>>>> 
>>>> ---
>>>> 
>>>> $ R --slave -e "utils::sessionInfo()" R Under
>>>> development (unstable) (2016-06-13 r70769) Platform:
>>>> x86_64-apple-darwin15.5.0 (64-bit) Running under: OS X
>>>> 10.11.5 (El Capitan)
> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list