[Rd] Change to I() in R 4.1
Pages, Herve
hp@ge@ @end|ng |rom |redhutch@org
Fri Oct 30 07:08:34 CET 2020
Hi Martin,
On 10/26/20 04:52, Martin Maechler wrote:
>>
>> Hi there,
>> Is that change in R-devel intentional?
>>
>> library(Matrix)
>> m <- as(matrix(c(0, 1)), "sparseMatrix")
>>
>> isS4(m)
>> # [1] TRUE
>>
>> x <- I(m)
>> # Warning message:
>> # In `class<-`(x, unique.default(c("AsIs", oldClass(x)))) :
>> # Setting class(x) to multiple strings ("AsIs", "dgCMatrix", ...);
>> result will no longer be an S4 object
>>
>> isS4(x)
>> # [1] FALSE
>>
>> This works fine in R 4.0.3 i.e. no warning and I() doesn't turn off the
>> S4 bit of the object.
>>
>> This change breaks 17 Bioconductor packages.
>>
>> Seems that the culprit is this change in how I() is implemented:
>>
>> In R 4.0.3:
>>
>> > I
>> function (x)
>> {
>> structure(x, class = unique(c("AsIs", oldClass(x))))
>> }
>>
>> In R devel:
>>
>> > I
>> function (x)
>> `class<-`(x, unique.default(c("AsIs", oldClass(x))))
>
> Yes, (by me), as I() was sticking out in the slowness bug PR#17794
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.r-2Dproject.org_bugzilla_show-5Fbug.cgi-3Fid-3D17794&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=isOqEV-_6Yk1PlRZBoBchZHZYnpQxPGEZdZPgTHMkKg&e=
>
> and the direct dangerous `call<-` will be replaced happily in
> I()'s definition.
>
> *But* as Luke Tierney had remarked to R-core, direct changing of
> the class of an S4 object has given the above warning for a
> quite while, (svn r47934 | jmc | 2009-02-17 )
> *and* it has rather been an inconsistency in R, that you could
> still use "low-level" means to change the class of an S4 object
> to something invalid.
>
> I really don't think people should be allowed to use I() to
> change a valid S4 object into an invalid one, but this is what
> happens (R 4.0.3 patched) :
>
>> require(Matrix); M <- Matrix(0, 2,3); IM <- I(M)
>> validObject(IM)
> Error in .classEnv(classDef) :
> trying to get slot "package" from an object of a basic class ("NULL") with no slots
>> IM
> ----------- FAILURE REPORT --------------
> --- failure: length > 1 in coercion to logical ---
> --- srcref ---
> :
> --- package (from environment) ---
> methods
> --- call from context ---
> showDefault(object)
> --- call from argument ---
> !is.null(clDef) && isS4(object) && is.na(match(clDef using className,
> .BasicClasses))
> --- R stacktrace ---
> where 1: showDefault(object)
> where 2: Error in showDefault(object) :
> cannot get a slot ("slots") from an object of type "NULL"
The way I interpret this is that 'IM' breaks validObject(). This is not
exactly the same as saying that 'IM' is invalid.
>
>> Unfortunately there is a bunch of code around that calls I() on S4
>> objects, admittedly not necessarily for very good reasons, but it
>> happens. Would it be possible that I() has a less destructive effect on
>> S4 objects?
>
> I'm not sure if this is really desirable... but I may fail to
> see the point of allowing invalid I(<S4>) objects as they
> appear in R 4.0.x ..
>
> So what do you really propose that I(.) should be doing, e.g.,
> for 'M' above ?
To provide some context, people typically use I() when they construct a
data frame as a way to tell the data frame constructor to treat the
supplied columns as-is. In Bioconductor we have DataFrame, a
data-frame-like structure where the columns can be anything, including
S4 objects, has long they have a vector-like semantic. Like
data.frame(), the DataFrame() constructor will also treat columns that
carry the AsIs tag as-is.
A typical use case where this feature is used on an S4 column is to nest
DataFrame objects. For example, with R 4.0.3:
library(S4Vectors)
df <- DataFrame(X=1:3, Y=letters[1:3])
DataFrame(a=df, Z=11:13)
# DataFrame with 3 rows and 3 columns
# a.X a.Y Z
# <integer> <character> <integer>
# 1 1 a 11
# 2 2 b 12
# 3 3 c 13
But if we wrap 'df' in I():
DataFrame(a=I(df), Z=11:13)
# DataFrame with 3 rows and 2 columns
# a Z
# <DataFrame> <integer>
# 1 1:a 11
# 2 2:b 12
# 3 3:c 13
'df' ends up in the 1st column of the returned DataFrame.
AFAICT the fact that wrapping 'df' in I() was producing some kind of
Frankenstein object that breaks validObject() has never been a problem
in practice because the Dataframe() constructor immediately removes the
AsIs tag internally. This restores the original object. Note that this
is an important difference with data.frame() where the AsIs tag sticks
around:
class(DataFrame(a=I(letters))$a)
# [1] "character"
class(data.frame(a=I(letters))$a)
# [1] "AsIs"
Now the new behavior of I() makes a little bit more damage to an S4
object. In addition to breaking validObject(), like the old behavior
did, it removes its S4 bit. This means that after removing the AsIs tag,
the DataFrame() constructor now will also need to repair the object
(with asS4(), which I just discovered today). This is not too bad and we
could do that. However the problem would remain that now users get an
ugly/obscure warning when they do things like:
DataFrame(a=I(df), Z=11:13).
I can think of 2 ways to move forward:
1. Keep I()'s current implementation but suppress the warning. We'll
make the necessary adjustments to DataFrame() to repair columns supplied
as I(<S4>) objects. Note that we would still be in the situation where
I(<S4>) objects break validObject() but we've been in that situation for
years and so far we've managed to work around it. However this doesn't
mean that validObject() shouldn't be fixed. Note that print(I(<S4>))
would also need to be fixed (it says "<S4 Type Object>" which is
misleading). Anyways, these 2 issues are separated from the main issue
and can be dealt with later.
2. Completely revisit the behavior of I() on S4 objects. Maybe it
shouldn't touch the class of the object, only add an attribute e.g.
attr(M, "AsIs") <- TRUE. This would obviously be an important change,
with the potential to break a lot of existing code. In particular, using
inherits(x, "AsIs") would no longer be a reliable way to check for the
presence of the AsIs tag so a dedicated function would probably be
needed e.g. is.AsIs(). Furthermore, at some point in the future, maybe
the attribute approach could be extended to everything, not just S4
objects. Right now I() breaks S4 method dispatch, not only on S4 objects:
setGeneric("foo", function(x) standardGeneric("foo"))
setMethod("foo", "dgCMatrix", function(x) 123)
foo(M)
# [1] 123
foo(IM)
# Error in (function (classes, fdef, mtable) :
# unable to find an inherited method for function ‘foo’ for
signature ‘"AsIs"’
but also on S3 objects:
setMethod("foo", "character", function(x) 456)
foo(letters)
# [1] 456
foo(I(letters))
# Error in (function (classes, fdef, mtable) :
# unable to find an inherited method for function ‘foo’ for
signature ‘"AsIs"’
So I would argue that the current class-based approach of I() is
hopelessly broken and that an attribute-based approach would be cleaner.
Thanks,
H.
>
> Martin
>
>
>>
>> Thanks,
>> H.
>>
>> > sessionInfo()
>> R Under development (unstable) (2020-10-17 r79346)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 20.04.1 LTS
>>
>> Matrix products: default
>> BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
>> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] Matrix_1.2-18
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_4.1.0 grid_4.1.0 lattice_0.20-41
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fredhutch.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=eSd-I3vlksMkH9SQjlruGL9bsHgTkUb1m7dG3OuVNAw&e=
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list