[Rd] Change to I() in R 4.1

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Fri Oct 30 07:08:34 CET 2020


Hi Martin,

On 10/26/20 04:52, Martin Maechler wrote:
>>
>> Hi there,
>> Is that change in R-devel intentional?
>>
>>     library(Matrix)
>>     m <- as(matrix(c(0, 1)), "sparseMatrix")
>>
>>     isS4(m)
>>     # [1] TRUE
>>
>>     x <- I(m)
>>     # Warning message:
>>     # In `class<-`(x, unique.default(c("AsIs", oldClass(x)))) :
>>     #   Setting class(x) to multiple strings ("AsIs", "dgCMatrix", ...);
>> result will no longer be an S4 object
>>
>>     isS4(x)
>>     # [1] FALSE
>>
>> This works fine in R 4.0.3 i.e. no warning and I() doesn't turn off the
>> S4 bit of the object.
>>
>> This change breaks 17 Bioconductor packages.
>>
>> Seems that the culprit is this change in how I() is implemented:
>>
>> In R 4.0.3:
>>
>>     > I
>>     function (x)
>>     {
>>       structure(x, class = unique(c("AsIs", oldClass(x))))
>>     }
>>
>> In R devel:
>>
>>     > I
>>     function (x)
>>     `class<-`(x, unique.default(c("AsIs", oldClass(x))))
> 
> Yes, (by me),  as  I()  was sticking out in the slowness bug PR#17794
>    https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.r-2Dproject.org_bugzilla_show-5Fbug.cgi-3Fid-3D17794&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=isOqEV-_6Yk1PlRZBoBchZHZYnpQxPGEZdZPgTHMkKg&e=
> 
> and the direct dangerous  `call<-`  will be replaced happily in
> I()'s definition.
> 
> *But* as Luke Tierney had remarked to R-core, direct changing of
> the class of an S4 object has given the above warning for a
> quite while, (svn r47934 | jmc | 2009-02-17 )
> *and* it has rather been an inconsistency in R, that you could
> still use "low-level" means to change the class of an S4 object
> to something invalid.
> 
> I really don't think people should be allowed to use I() to
> change a valid S4 object into an invalid one, but this is what
> happens (R 4.0.3 patched) :
> 
>> require(Matrix); M <- Matrix(0, 2,3); IM <- I(M)
>> validObject(IM)
> Error in .classEnv(classDef) :
>    trying to get slot "package" from an object of a basic class ("NULL") with no slots
>> IM
>   ----------- FAILURE REPORT --------------
>   --- failure: length > 1 in coercion to logical ---
>   --- srcref ---
> :
>   --- package (from environment) ---
> methods
>   --- call from context ---
> showDefault(object)
>   --- call from argument ---
> !is.null(clDef) && isS4(object) && is.na(match(clDef using className,
>      .BasicClasses))
>   --- R stacktrace ---
> where 1: showDefault(object)
> where 2: Error in showDefault(object) :
>    cannot get a slot ("slots") from an object of type "NULL"

The way I interpret this is that 'IM' breaks validObject(). This is not 
exactly the same as saying that 'IM' is invalid.

> 
>> Unfortunately there is a bunch of code around that calls I() on S4
>> objects, admittedly not necessarily for very good reasons, but it
>> happens. Would it be possible that I() has a less destructive effect on
>> S4 objects?
> 
> I'm not sure if this is really desirable... but I may fail to
> see the point of allowing invalid   I(<S4>)  objects as they
> appear in R 4.0.x  ..
> 
> So what do you really propose that  I(.) should be doing, e.g.,
> for  'M'  above ?

To provide some context, people typically use I() when they construct a 
data frame as a way to tell the data frame constructor to treat the 
supplied columns as-is. In Bioconductor we have DataFrame, a 
data-frame-like structure where the columns can be anything, including 
S4 objects, has long they have a vector-like semantic. Like 
data.frame(), the DataFrame() constructor will also treat columns that 
carry the AsIs tag as-is.

A typical use case where this feature is used on an S4 column is to nest 
DataFrame objects. For example, with R 4.0.3:

   library(S4Vectors)
   df <- DataFrame(X=1:3, Y=letters[1:3])

   DataFrame(a=df, Z=11:13)
   # DataFrame with 3 rows and 3 columns
   #         a.X         a.Y         Z
   #   <integer> <character> <integer>
   # 1         1           a        11
   # 2         2           b        12
   # 3         3           c        13

But if we wrap 'df' in I():

   DataFrame(a=I(df), Z=11:13)
   # DataFrame with 3 rows and 2 columns
   #             a         Z
   #   <DataFrame> <integer>
   # 1         1:a        11
   # 2         2:b        12
   # 3         3:c        13

'df' ends up in the 1st column of the returned DataFrame.

AFAICT the fact that wrapping 'df' in I() was producing some kind of 
Frankenstein object that breaks validObject() has never been a problem 
in practice because the Dataframe() constructor immediately removes the 
AsIs tag internally. This restores the original object. Note that this 
is an important difference with data.frame() where the AsIs tag sticks 
around:

   class(DataFrame(a=I(letters))$a)
   # [1] "character"

   class(data.frame(a=I(letters))$a)
   # [1] "AsIs"

Now the new behavior of I() makes a little bit more damage to an S4 
object. In addition to breaking validObject(), like the old behavior 
did, it removes its S4 bit. This means that after removing the AsIs tag, 
the DataFrame() constructor now will also need to repair the object 
(with asS4(), which I just discovered today). This is not too bad and we 
could do that. However the problem would remain that now users get an 
ugly/obscure warning when they do things like:

   DataFrame(a=I(df), Z=11:13).

I can think of 2 ways to move forward:

1. Keep I()'s current implementation but suppress the warning. We'll 
make the necessary adjustments to DataFrame() to repair columns supplied 
as I(<S4>) objects. Note that we would still be in the situation where 
I(<S4>) objects break validObject() but we've been in that situation for 
years and so far we've managed to work around it. However this doesn't 
mean that validObject() shouldn't be fixed. Note that print(I(<S4>)) 
would also need to be fixed (it says "<S4 Type Object>" which is 
misleading). Anyways, these 2 issues are separated from the main issue 
and can be dealt with later.

2. Completely revisit the behavior of I() on S4 objects. Maybe it 
shouldn't touch the class of the object, only add an attribute e.g. 
attr(M, "AsIs") <- TRUE. This would obviously be an important change, 
with the potential to break a lot of existing code. In particular, using 
inherits(x, "AsIs") would no longer be a reliable way to check for the 
presence of the AsIs tag so a dedicated function would probably be 
needed e.g. is.AsIs(). Furthermore, at some point in the future, maybe 
the attribute approach could be extended to everything, not just S4 
objects. Right now I() breaks S4 method dispatch, not only on S4 objects:

   setGeneric("foo", function(x) standardGeneric("foo"))
   setMethod("foo", "dgCMatrix", function(x) 123)

   foo(M)
   # [1] 123

   foo(IM)
   # Error in (function (classes, fdef, mtable)  :
   #   unable to find an inherited method for function ‘foo’ for 
signature ‘"AsIs"’

but also on S3 objects:

   setMethod("foo", "character", function(x) 456)

   foo(letters)
   # [1] 456

   foo(I(letters))
   # Error in (function (classes, fdef, mtable)  :
   #   unable to find an inherited method for function ‘foo’ for 
signature ‘"AsIs"’

So I would argue that the current class-based approach of I() is 
hopelessly broken and that an attribute-based approach would be cleaner.

Thanks,
H.


> 
> Martin
> 
> 
>>
>> Thanks,
>> H.
>>
>>   > sessionInfo()
>> R Under development (unstable) (2020-10-17 r79346)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 20.04.1 LTS
>>
>> Matrix products: default
>> BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
>> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
>>
>> locale:
>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] Matrix_1.2-18
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_4.1.0  grid_4.1.0      lattice_0.20-41
>>
>> -- 
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=eSd-I3vlksMkH9SQjlruGL9bsHgTkUb1m7dG3OuVNAw&e=

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the R-devel mailing list