[BioC] subsetting SummarizedExperiments - a proposal (or a hack?)
Cook, Malcolm
MEC at stowers.org
Wed Feb 5 23:11:28 CET 2014
Martin,
Google says: 'Your search - "subset.SummarizedExperiment" - did not match any documents.'
So I offer the below for your consideration, along with a few examples and a quick benchmark comparison with using the more explicit `[` extraction operator.
subset.SummarizedExperiment<-function(x
,rowSubset=TRUE
,colSubset=TRUE
,assaySubset=TRUE
,drop=FALSE
,provenanceTrack=FALSE
,...) {
## PURPOSE: implement subsetting of SummarizedExperiments,
## allowing expressions in terms of:
##
## + the GRanges (and its mcols meta-data) held in rowData
## + the experimental meta-data held in the colData DataFrame
## + the names of the assays
x<-x[
##eval(as.expression(substitute(row)),mcols(rowData(x)),.GlobalEnv)
eval(as.expression(substitute(rowSubset)),as.data.frame(rowData(x)),.GlobalEnv)
,eval(as.expression(substitute(colSubset)),colData(x),.GlobalEnv)
,drop=drop,...]
if (! identical(TRUE,assaySubset)) assays(x)<-assays(x)[assaySubset]
if(provenanceTrack) {
exptData(x)$rowSubset<-c(exptData(x)$rowSubset,as.character(substitute(rowSubset)))
exptData(x)$colSubset<-c(exptData(x)$colSubset,as.character(substitute(colSubset)))
}
x
}
attr(subset,'ex')<-function() {
example(SummarizedExperiment)
assays(se1)$a2<-assays(se1)$counts*2
assays(se1)$a3<-assays(se1)$counts*3
benchmark(replications=100
,se1.ss1<-se1[start(rowData(se1))==344757,se1$Treatment=='ChIP']
,se1.ss2<-subset(se1,start==344757,Treatment=='ChIP')
)
stopifnot(identical(assays(se1.ss1),assays(se1.ss2)))
se1.ss3<-subset(se1,strand=='+',Treatment=='ChIP',c('a2','a3'))
stopifnot(identical(c('a2','a3'),names(assays(se1.ss3))))
}
The rbenchmarks shows the cost of overhead in this contrived minimal example. YMMV:
test replications elapsed relative user.self sys.self user.child sys.child
3 c("a2", "a3") 100 0.001 1 0.001 0.000 0 0
1 se1.ss1 <- se1[start(rowData(se1)) == 344757, se1$Treatment == "ChIP"] 100 3.113 3113 3.111 0.001 0 0
2 se1.ss2 <- subset(se1, start == 344757, Treatment == "ChIP") 100 3.486 3486 3.486 0.000 0 0
Is all this in the spirit of things as you see it? It sure makes my use of SE "scan" (erhm, like poetry;)
if you like, I also have castLong and castWide functions to reshape a (subsetted) SE as a data.table for use in ggplot and friends.
You like?
Cheers,
~Malcolm
More information about the Bioconductor
mailing list