[Bioc-devel] Possible to export coerce2() from S4Vectors?

Bemis, Kylie k@bemi@ @ending from northe@@tern@edu
Wed Nov 14 18:57:44 CET 2018


Yes, I will make sure my cbind() implementation coerces to the correct subclass.

That could solve my error as well, but the warnings about S4 dispatch on “...” are still a problem.

-Kylie

On Nov 14, 2018, at 12:38 PM, Michael Lawrence <lawrence.michael using gene.com<mailto:lawrence.michael using gene.com>> wrote:

The use of c() in the implementation of [[<- is problematic, since [[<- has the semantic of insertion, preserving the overall structure of x, while c() is a combination of two or more peer data structures, and it is difficult to define the correct logic through dispatch.

The dispatch on ... is not well documented. I will try to improve that, as soon as I understand it myself. But no matter what, your cbind() method will need to uplift ordinary DataFrames to IndexedDataFrame.

Michael

On Wed, Nov 14, 2018 at 7:52 AM Bemis, Kylie <k.bemis using northeastern.edu<mailto:k.bemis using northeastern.edu>> wrote:
Hi Michael,

Here is a simple example of what I’m trying to do:

setClass("IndexedDataFrame",
contains="DataFrame",
slots=c(ids="numeric"))

# track additional ID metadata w/ special rules
IndexedDataFrame <- function(ids, ...) {
x <- DataFrame(...)
new("IndexedDataFrame",
ids=ids,
rownames=rownames(x),
nrows=nrow(x),
listData=x using listData,
elementMetadata=mcols(x))
}

# check for matching IDs before cbind-ing
setMethod("cbind", "IndexedDataFrame",
function(...) {
args <- list(...)
ids <- args[[1L]]@ids
ok <- vapply(args, function(a) {
# check for compatible IDs
identical(a using ids, ids)
}, logical(1))
if ( !all(ok) )
stop("ids must match")
x <- callNextMethod(...)
new(class(args[[1L]]),
ids=ids,
rownames=rownames(x),
nrows=nrow(x),
listData=x using listData,
elementMetadata=mcols(x))
})

set.seed(1)
idf <- IndexedDataFrame(ids=runif(10), a=1:10, b=11:20)
idf$c <- 21:30

Error in identical(a using ids, ids) :
  no slot of name "ids" for this object of class "DataFrame"
In addition: Warning message:
In methods:::.selectDotsMethod(classes, .MTable, .AllMTable) :
  multiple direct matches: "IndexedDataFrame", "DataFrame"; using the first of these

Specific examples where I use this pattern are new MassDataFrame and PositionDataFrame classes in Cardinal, which require associated m/z-values and pixel coordinates as additional metadata. Current source code is here:

https://github.com/kuwisdelu/Cardinal/blob/master/R/methods2-MassDataFrame.R<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkuwisdelu%2FCardinal%2Fblob%2Fmaster%2FR%2Fmethods2-MassDataFrame.R&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=VZC1bED%2BMesUG3%2FYmjm2NarP3rH3wpwsI3Xmqlnv6AU%3D&reserved=0>
https://github.com/kuwisdelu/Cardinal/blob/master/R/methods2-PositionDataFrame.R<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkuwisdelu%2FCardinal%2Fblob%2Fmaster%2FR%2Fmethods2-PositionDataFrame.R&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=FqUMv0fDdgEe3H0yKV%2BhfKl%2BtCml5pgwR%2FyQBoSnWwk%3D&reserved=0>

In older versions of Cardinal, similar versions of these classes extended AnnotatedDataFrame and used regular columns for this metadata, while requiring those columns to follow a specific naming scheme. This proved fragile, difficult to maintain, and easily broken, so I am now using slots to contain this metadata so they can be validated independently of whatever user-supplied columns exist.

Kylie

~~~
Kylie Ariel Bemis
College of Computer and Information Science
Northeastern University
kuwisdelu.github.io<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=gu3BTRG0wUSpYQtG%2F6Vy605LkJvKqZWlsyUt1LHmqRk%3D&reserved=0>





On Nov 14, 2018, at 10:12 AM, Michael Lawrence <lawrence.michael using gene.com<mailto:lawrence.michael using gene.com>> wrote:

I don't want to derail this thread, but why is coerce2() necessary? Would it be possible to fold its logic into as() without breaking too much?

Kylie,

It would help to see your code, with some pointers to where things break.

Michael

On Wed, Nov 14, 2018 at 5:36 AM Bemis, Kylie <k.bemis using northeastern.edu<mailto:k.bemis using northeastern.edu>> wrote:
Hi Herve,

Thanks for the detailed reply. Using as() makes sense. Unfortunately my use case makes it a little more complicated.

The issue comes from a combination of factors:

- My DataFrame subclasses track additional metadata for each row, separate from the typical user-defined columns
- This metadata is checked to decide how to do cbind(...) or if cbind(...) makes sense for those objects
- cbind(...) ends up being called internally by some inherited assignment methods like [[<-
- Coercing to my subclass with as() results in incompatible metadata, causing cbind(...) to fail

I see a few solutions:

1. Using coerce2() works where as() doesn’t, because it takes an example of the “to” object rather than just the class, so compatible metadata can be copied directly from the “to” object, allowing cbind(…) to work as intended.

2. Create an exception to my class logic that allows the metadata to be missing, and change my cbind(…) implementation to ignore the metadata in the case that it is missing.

3. Supply my own version of methods like [[<-. I don’t like this one, since it should be unnecessary.

I can do (2), but I would need to rethink some of my other methods that expect that metadata to exist, so I wanted to check on the plans for coerce2() before making those changes.

What are your thoughts?

Thanks!
Kylie

~~~
Kylie Ariel Bemis
College of Computer and Information Science
Northeastern University
kuwisdelu.github.io<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=gb8odSGF5K9LUohcpSKAxHK4RZu948LKGWoRImihcZQ%3D&reserved=0><https://kuwisdelu.github.io<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=gu3BTRG0wUSpYQtG%2F6Vy605LkJvKqZWlsyUt1LHmqRk%3D&reserved=0>>





On Nov 13, 2018, at 8:55 PM, Pages, Herve <hpages using fredhutch.org<mailto:hpages using fredhutch.org><mailto:hpages using fredhutch.org<mailto:hpages using fredhutch.org>>> wrote:


Hi Kylie,

I've modified coerce2() in S4Vectors 0.21.5 so that `coerce2(from, to)` should now do the right thing when 'to' is a DataFrame derivative:

  https://github.com/Bioconductor/S4Vectors/commit/48e11dd2c8d474c63e09a69ee7d2d2ec35d7307a<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBioconductor%2FS4Vectors%2Fcommit%2F48e11dd2c8d474c63e09a69ee7d2d2ec35d7307a&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=QyzrJNfFWBBin44CKQSIcZ4tfZEUT9evK3EVOOeOClI%3D&reserved=0><https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBioconductor%2FS4Vectors%2Fcommit%2F48e11dd2c8d474c63e09a69ee7d2d2ec35d7307a&data=02%7C01%7Ck.bemis%40northeastern.edu%7Cd1ed8517bd164aeed6be08d649d441d6%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636777573332495470&sdata=6QqZsJmrVuB1fQ0FcBCvSIZT3Uyt3CBmhlsE7YzZNiw%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBioconductor%2FS4Vectors%2Fcommit%2F48e11dd2c8d474c63e09a69ee7d2d2ec35d7307a&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=QyzrJNfFWBBin44CKQSIcZ4tfZEUT9evK3EVOOeOClI%3D&reserved=0>>

With the following gotcha: this will work only if coercion (with as()) from DataFrame to the DataFrame derivative does the right thing. So I'm assuming that this coercion makes sense and can be supported. There are 2 possible situations:

1) The automatic coercion method from DataFrame to your DataFrame derivative (i.e. the coercion method automatically defined by the methods package) does the right thing. In this case coerce2() (and therefore [[<-) will also do the right thing on your DataFrame derivatives. For example:

  library(S4Vectors)
  setClass("MyDataFrameExtension", contains="DataFrame")

  ## WARNING: Don't trust selectMethod() here!
  selectMethod("coerce", c("DataFrame", "MyDataFrameExtension"))
  # Error in selectMethod("coerce", c("DataFrame", "MyDataFrameExtension")) :
  #  no method found for signature DataFrame, MyDataFrameExtension

  as(DataFrame(), "MyDataFrameExtension")
  # MyDataFrameExtension with 0 rows and 0 columns

  ## The automatic coercion method is only created the 1st time it's used!
  ## So now selectMethod() shows it:
  selectMethod("coerce", c("DataFrame", "MyDataFrameExtension"))
  # Method Definition:
  #
  # function (from, to = "MyDataFrameExtension", strict = TRUE)
  # {
  #     obj <- new("MyDataFrameExtension")
  #     as(obj, "DataFrame") <- from
  #     obj
  # }
  # <environment: namespace:S4Vectors>
  #
  # Signatures:
  #         from        to
  # target  "DataFrame" "MyDataFrameExtension"
  # defined "DataFrame" "MyDataFrameExtension"


  MDF <- new("MyDataFrameExtension")
  S4Vectors:::coerce2(list(aa=1:3, bb=21:23), MDF)
  # MyDataFrameExtension with 3 rows and 2 columns
  #          aa        bb
  #   <integer> <integer>
  # 1         1        21
  # 2         2        22
  # 3         3        23


2) The automatic coercion method from DataFrame to your DataFrame derivative doesn't do the right thing (e.g. it returns an invalid object). In this case you need to define this coercion (with a setAs() statement). This will allow coerce2() (and therefore [[<-) to do the right thing on your DataFrame derivatives.

There is no plan at the moment to export coerce2() because this should not be needed. The idea is that developers should not need to define "coerce2" methods but instead make it work via the addition of the appropriate coercion methods. The only purpose of coerce2() is to support things like [[<- and endoapply(). Once coerce2() works properly, these things work out-of-the-box.

So to summarize: just make sure that a DataFrame can be coerced to your DataFrame derivative and [[<- and endoapply() will work out-of-the-box. It could be however that this coercion doesn't make sense and cannot be supported, in which case, we'll need to do something different. Let me know if that's the case.

H.


On 11/13/18 12:45, Bemis, Kylie wrote:

Dear all,

Are there any plans to export coerce2() from the S4Vectors namespace, like other exported internal utilities such as showAsCell() and setListElement()?

I have a couple classes that inherit from DataFrame, and some inherited methods (like [[<-) break in certain situations due to calls to coerce2() that coerce arguments to a regular DataFrame (instead of my subclass). This could be fixed if I were able to implement a coerce2() method for my subclass.

Any suggestions on how to approach problems like this when inheriting from DataFrame and other Vector derivatives?

Many thanks,
Kylie

~~~
Kylie Ariel Bemis
College of Computer and Information Science
Northeastern University
kuwisdelu.github.io<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=gb8odSGF5K9LUohcpSKAxHK4RZu948LKGWoRImihcZQ%3D&reserved=0><http://kuwisdelu.github.io<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=gb8odSGF5K9LUohcpSKAxHK4RZu948LKGWoRImihcZQ%3D&reserved=0>><https://urldefense.proofpoint.com/v2/url?u=https-3A__kuwisdelu.github.io&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw&s=HNW2_h6JRKnX1LQZ2SSqiL_QW6jpN_tkMhrFIREkk7Y&e=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__kuwisdelu.github.io%26d%3DDwICAg%26c%3DeRAMFD45gAfqt84VtBcfhQ%26r%3DBK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA%26m%3DYnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw%26s%3DHNW2_h6JRKnX1LQZ2SSqiL_QW6jpN_tkMhrFIREkk7Y%26e%3D&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=ZvJ%2FcmTbRv6Y%2BREBEzyoekFBSL6D7txjzBWne1KM9wU%3D&reserved=0>><https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__kuwisdelu.github.io%26d%3DDwICAg%26c%3DeRAMFD45gAfqt84VtBcfhQ%26r%3DBK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA%26m%3DYnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw%26s%3DHNW2_h6JRKnX1LQZ2SSqiL_QW6jpN_tkMhrFIREkk7Y%26e%3D&data=02%7C01%7Ck.bemis%40northeastern.edu%7Cd1ed8517bd164aeed6be08d649d441d6%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636777573332505479&sdata=uPTxySms1gmD4n6y3msY32Wbk%2FnJ%2FypXQFHxb3bITIQ%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__kuwisdelu.github.io%26d%3DDwICAg%26c%3DeRAMFD45gAfqt84VtBcfhQ%26r%3DBK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA%26m%3DYnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw%26s%3DHNW2_h6JRKnX1LQZ2SSqiL_QW6jpN_tkMhrFIREkk7Y%26e%3D&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=ZvJ%2FcmTbRv6Y%2BREBEzyoekFBSL6D7txjzBWne1KM9wU%3D&reserved=0>>






        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org><mailto:Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org>> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw&s=dHF-9Xq_n_5IQLQOG3zZ9agK2zTSyNmaRq1M8N29Flc&e=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel%26d%3DDwICAg%26c%3DeRAMFD45gAfqt84VtBcfhQ%26r%3DBK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA%26m%3DYnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw%26s%3DdHF-9Xq_n_5IQLQOG3zZ9agK2zTSyNmaRq1M8N29Flc%26e%3D&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=%2FVEWv7TksaRiSCNVtvK9HRQOBAfOgzOu%2BJ%2FSL4P4C%2Fg%3D&reserved=0><https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel%26d%3DDwICAg%26c%3DeRAMFD45gAfqt84VtBcfhQ%26r%3DBK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA%26m%3DYnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw%26s%3DdHF-9Xq_n_5IQLQOG3zZ9agK2zTSyNmaRq1M8N29Flc%26e%3D&data=02%7C01%7Ck.bemis%40northeastern.edu%7Cd1ed8517bd164aeed6be08d649d441d6%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636777573332515491&sdata=Rjm40EluXANLI2LAdgoGr8Xxi%2FfvvcbWU2cwBuhl7zU%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel%26d%3DDwICAg%26c%3DeRAMFD45gAfqt84VtBcfhQ%26r%3DBK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA%26m%3DYnUxTakT9DhxLeLzGGXceB1HxJFEr0ZVHagTMe0vAWw%26s%3DdHF-9Xq_n_5IQLQOG3zZ9agK2zTSyNmaRq1M8N29Flc%26e%3D&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=%2FVEWv7TksaRiSCNVtvK9HRQOBAfOgzOu%2BJ%2FSL4P4C%2Fg%3D&reserved=0>>


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org<mailto:hpages using fredhutch.org><mailto:hpages using fredhutch.org<mailto:hpages using fredhutch.org>>
Phone:  (206) 667-5791
Fax:    (206) 667-1319



        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel&data=02%7C01%7Ck.bemis%40northeastern.edu%7C081784ec8f744e4f53ea08d64a57fbe3%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636778139103458490&sdata=6iTlFyDJjU5%2B1y9zlJr2pQHl75Yd2UP3G97DaZvRn5k%3D&reserved=0>


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list