[Rd] How to use `[` without evaluating the arguments.
Eeles, Christopher
Chr|@topher@Ee|e@ @end|ng |rom uhnre@e@rch@c@
Thu Sep 24 23:45:26 CEST 2020
Hello R-devel,
I am currently attempting to implement an API similar to data.table wherein single bracket subsetting can accept an unquoted expression to be evaluated in the context of my object.
A simple example from the data.table package looks like this:
DT <- data.table(col1 = c('a', 'b', 'c'), col2 = c('x', 'y', 'z'))
DT[col1 == 'a']
Where the expression i in DT[i, j] is captured with substitute then evaluated inside the DT object.
Reviewing the source code from data.table, it seems that they implemented this feature simple by defining a new S3 method on `[` called `[.data.table`. I tried to replicate this API as follows.
I have defined an S4 which contains an S3 class as follows:
#' Define an S3 Class
#'
#' Allows use of S3 methods with new S4 class. This is required to overcome
#' limitations of the `[` S4 method.
#'
setOldClass('long.table')
#' LongTable class definition
#'
#' Define a private constructor method to be used to build a `LongTable` object.
#'
#' @param drugs [`data.table`]
#' @param cells [`data.table`]
#' @param assays [`list`]
#' @param metadata [`list`]
#'
#'
#' @return [`LongTable`] object containing the assay data from a
#'
#' @import data.table
#' @keywords internal
.LongTable <- setClass("LongTable",
slots=list(rowData='data.table',
colData='data.table',
assays='list',
metadata='list',
.intern='environment'),
contains='long.table')
#' LongTable constructor method
#'
#' @param rowData [`data.table`, `data.frame`, `matrix`] A table like object
#' coercible to a `data.table` containing the a unique `rowID` column which
#' is used to key assays, as well as additional row metadata to subset on.
#' @param rowIDs [`character`, `integer`] A vector specifying
#' the names or integer indexes of the row data identifier columns. These
#' columns will be pasted together to make up the row.names of the
#' `LongTable` object.
#' @param colData [`data.table`, `data.frame`, `matrix`] A table like object
#' coercible to a `data.table` containing the a unique `colID` column which
#' is used to key assays, as well as additional column metadata to subset on.
#' @param colIDs [`character`, `integer`] A vector specifying
#' the names or integer indexes of the col data identifier columns. These
#' columns will be pasted together to make up the col.names of the
#' `LongTable` object.
#' @param assays A [`list`] containing one or more objects coercible to a
#' `data.table`, and keyed by rowID and colID corresponding to the rowID and
#' colID columns in colData and rowData.
#' @param metadata A [`list`] of metadata associated with the `LongTable`
#' object being constructed
#' @param keep.rownames [`logical` or `character`] Logical: whether rownames
#' should be added as a column if coercing to a `data.table`, default is FALSE.
#' If TRUE, rownames are added to the column 'rn'. Character: specify a custom
#' column name to store the rownames in.
#'
#' @return [`LongTable`] object
#'
#' @import data.table
#' @export
LongTable <- function(rowData, rowIDs, colData, colIDs, assays,
metadata=list(), keep.rownames=FALSE) {
## TODO:: Handle missing parameters
if (!is(colData, 'data.table')) {
colData <- data.table(colData, keep.rownames=keep.rownames)
}
if (!is(rowData, 'data.table')) {
rowData <- data.table(rowData, keep.rownames=keep.rownames)
}
if (!all(vapply(assays, FUN=is.data.table, FUN.VALUE=logical(1)))) {
tryCatch({
assays <- lapply(assays, FUN=data.table, keep.rownames=keep.rownames)
}, warning = function(w) {
warning(w)
}, error = function(e, assays) {
message(e)
types <- lapply(assays, typeof)
stop(paste0('List items are types: ',
paste0(types, collapse=', '),
'\nPlease ensure all items in the assays list are
coerced to data.tables!'))
})
}
# Initialize the .internals object to store private metadata for a LongTable
internals <- new.env()
## TODO:: Implement error handling
internals$rowIDs <-
if (is.numeric(rowIDs) && max(rowIDs) < ncol(rowData))
rowIDs
else
which(colnames(rowData) %in% rowIDs)
lockBinding('rowIDs', internals)
internals$colIDs <-
if (is.numeric(colIDs) && max(colIDs) < ncol(colData))
colIDs
else
which(colnames(colData) %in% colIDs)
lockBinding('colIDs', internals)
# Assemble the pseudo row and column names for the LongTable
.pasteColons <- function(...) paste(..., collapse=':')
rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$rowIDs]
colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$colIDs]
return(.LongTable(rowData=rowData, colData=colData,
assays=assays, metadata=metadata,
.intern=internals))
}
I have also defined a subset method as an S3 and S4 generic:
#' Subset method for a LongTable object.
#'
#' Allows use of the colData and rowData `data.table` objects to query based on
#' rowID and colID, which is then used to subset all value data.tables stored
#' in the dataList slot.
#'
#' This function is endomorphic, it always returns a LongTable object.
#'
#' @param x [`LongTable`] The object to subset.
#' @param rowQuery [`character`, `numeric`, `logical` or `expression`]
#' Character: pass in a character vector of drug names, which will subset the
#' object on all row id columns matching the vector.
#'
#' Numeric or Logical: these select based on the rowKey from the `rowData`
#' method for the `LongTable`.
#'
#' Expression: Accepts valid query statements to the `data.table` i parameter,
#' this can be used to make complex queries using the `data.table` API
#' for the `rowData` data.table.
#'
#' @param columnQuery [`character`, `numeric`, `logical` or `expression`]
#' Character: pass in a character vector of drug names, which will subset the
#' object on all drug id columns matching the vector.
#'
#' Numeric or Logical: these select base don the rowID from the `rowData`
#' method for the `LongTable`.
#'
#' Expression: Accepts valid query statements to the `data.table` i parameter,
#' this can be used to make complex queries using the `data.table` API
#' for the `rowData` data.table.
#'
#' @param values [`character`, `numeric` or `logical`] Optional list of value
#' names to subset. Can be used to subset the dataList column further,
#' returning only the selected items in the new LongTable.
#'
#' @return [`LongTable`] A new `LongTable` object subset based on the specified
#' parameters.
#'
#' @importMethodsFrom BiocGenerics subset
#' @import data.table
#' @export
subset.long.table <- function(x, rowQuery, columnQuery, assays) {
longTable <- x
rm(x)
if (!missing(rowQuery)) {
if (tryCatch(is.character(rowQuery), error=function(e) FALSE)) {
select <- grep('^cellLine[:digit:]*', colnames(rowData(longTable)), value=TRUE)
rowQueryString <- paste0(paste0(select, ' %in% ', .variableToCodeString(rowQuery)), collapse=' | ')
rowQuery <- str2lang(rowQueryString)
} else {
rowQuery <- substitute(rowQuery)
}
rowDataSubset <- rowData(longTable)[eval(rowQuery), ]
} else {
rowDataSubset <- rowData(longTable)
}
if (!missing(columnQuery)) {
if (tryCatch(is.character(columnQuery), error=function(e) FALSE)) {
select <- grep('^drug[:digit:]*', colnames(colData(longTable)), value=TRUE)
columnQueryString <- paste0(paste0(select, ' %in% ', .variableToCodeString(columnQuery)), collapse=' | ')
columnQuery <- str2lang(columnQueryString)
} else {
columnQuery <- substitute(columnQuery)
}
colDataSubset <- colData(longTable)[eval(columnQuery), ]
} else {
colDataSubset <- colData(longTable)
}
rowKeys <- rowDataSubset$rowKey
colKeys <- colDataSubset$colKey
if (missing(assays)) { assays <- assayNames(longTable) }
keepAssays <- assayNames(longTable) %in% assays
assayData <- lapply(assays(longTable)[keepAssays],
FUN=.filterLongDataTable,
indexList=list(rowKeys, colKeys))
return(LongTable(colData=colDataSubset, colIDs=longTable using .intern$colIDs ,
rowData=rowDataSubset, rowIDs=longTable using .intern$rowIDs,
assays=assayData, metadata=metadata(longTable)))
}
setMethod('subset', 'LongTable', subset.long.table)
Everything behaves as I expect when calling the subset function. For example
subset(longTable, cellLine1 == 'VCAP)
Successfully returns while also working with character, integer or boolean based indexing.
The issue arises when I try to implement the '[' method. I have tried a number of different approaches, but none of them has been successful. My current approach is as follows:
`[.long.table` <- function(x, i, j) eval(substitute(subset(x, i, j)))
This function works as expected in most cases, for example.
longTable[c(1,2,3), c(1,2,3,]
`[.long.table`(longTable, cellLine1 == 'VCAP')
Both work normally.
However, when I try using `[` like an operator:
longTable[cellLine1 == 'VCAP', ]
I get the error 'Error: object 'cellLine1' not found'.
This suggests to me that instead of passing the expression into the function `[`, it is trying to evaluate the expression before dispatching a method.
Given that similar syntax works fine with data.table, and I believe also in the tibble tidyverse package, I am quite confused.
If you have any recommendations on how I can prevent evaluation prior to method dispatch, or of a work around that would produce the same API using a different approach, it would be appreciated.
Thanks for your assistance.
Best,
---
Christopher Eeles
Software Developer
BHK Laboratory<http://www.bhklab.ca/>
Princess Margaret Cancer Centre<https://www.pmgenomics.ca/pmgenomics/>
University Health Network<http://www.uhn.ca/>
This e-mail may contain confidential and/or privileged i...{{dropped:22}}
More information about the R-devel
mailing list