[BioC] custom subset method / handling columns selection as logic in '...' parameter
Martin Morgan
mtmorgan at fhcrc.org
Thu Dec 20 15:46:22 CET 2007
Eric --
Please don't cross post
Please simplify your example so that others do not have to work hard
to understand what you are asking
See additional comments below
"Eric Lecoutre" <ericlecoutre at gmail.com> writes:
> Dear R-helpers & bioconductor
>
>
> Sorry for cross-posting, this concerns R-programming stuff applied on
> Bioconductor context.
> Also sorry for this long message, I try to be complete in my request.
>
> I am trying to write a subset method for a specific class (ExpressionSet
> from Bioconductor) allowing selection more flexible than "[" method .
>
> The schema I am thinking for is the following:
>
> subset.ExpressionSet <- function(x,subset,...){
>
> }
ExpressionSet is an S4 class, using S4 methods, you will get into
trouble mixing S3 (implied above) and S4.
> I will use the subset argument for rows (genes), as in default method.
>
> Now I would like to allow to select different columns (features) based on
> phenotypic data.
> phenotypic data provides detailed information about the columns.
columns of an ExpressionSet are samples / phenotyes, rows are features.
> Basically, first function I have written allows the following:
>
>> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2)
> # subset=NULL takes all rows
>
> See: there are two conditions on two variables belonging to the associated
> data.frame encapsulated in the ExpressionSetObject (to be complete, the
> conditions will be applied on more of 2 columns, as they are used on the
> phylogenic data.frame that concerns all variables)
'phylogenic' is not part of the terminology; you are perhaps aiming
for 'phenotypic'?
> To simplify a little bit, this would nearly return:
> ExpressionSetObject[,V1==value & V2==value]
The usual idiom is exactly this; if e is an ExpressionSet instance
> e[,e$V1==value1 & e$V2==value2]
the '$' is defined to access the phenoData slot of the ExpressionSet.
> This is nice as I can already handle any number of conditions on variables
> values thanks to '...'. First step is
> conditions <- list(...) and are then handled later in code
>
> Nevertheless, those conditions are basic (one value).
>
> I would like to handle arbitrary conditions, such as: V1 %in% c(value1,
> value2)
> More simple expression would be passed with V2==value instead of V2=value2
>
> My very problem is that I don't know how to turn '...' into an object
> containing those conditions that could be used later.
I get confused here; can you clarify (this means 'make it simpler',
not 'make it longer'). In the future, if this is where your question
is, then it would have been appropriate to formulate it in such a way
as to avoid involving ExpressionSet, and posting to the R mailing
list.
> My attempt which seems the nearest is:
>
>> foo <- function(...){
>> as.expression(substitute(list(...)))
>> }
>>foo(x==1,y%in%1:2)
> expression(list(x == 1, y %in% 1:2))
>
> where as I would like to have something like
> list(expression(x==1), expression(y %in% 1:2))
> those expressions beeing evaluated later on in the context of my specific
> object.
>
>
> Are there any existing function where '...' are already handled the way I
> want so that I can mimic?
>
> Thanks for any insight.
>
>
> Eric
>
> ---
>
> For those who have Biobase available, here is my current subset function and
> a demo-case that explains a little bit.
>
>
> library(Biobase)
> example(ExpressionSet) # create sample object
> print(expressionSet)
>
> # now my subset function as it is
>
> subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){
> # subset is used to subset on rows
> # ... is used to make multiple conditions on columns based on pData
> # list of conditions is handled in ...
> stopifnot(is(x,"ExpressionSet"))
> phenoData <- pData(x)
> listCriteria <- list(...)
> if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x)))
> subset <- subset & !is.na(subset)
> retainedCriteria <- list()
> tmp <- sapply(names(listCriteria), function(critname) {
> if(!critname %in% colnames(phenoData)){
> if (verbose) cat("\n*** subsetCompounds: Dropped
> criteria:",critname, "not in phenoData of object\n")
> }else{
> if(is.null(listCriteria[critname])) listCriteria[[critname]]<-
> unique(phenoData[,critname])
> retainedCriteria[[critname]] <<- phenoData[,critname] %in%
> listCriteria[critname]
> }
> })
> criteriaValues <- do.call("cbind",retainedCriteria)
>
> selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)]
> ## cbind(phenoData,criteriaValues)
> out <- x[subset,selectedColumns]
> if (verbose) cat('\n',length(selectedColumns),' columns selected
> (',paste(selectedColumns,collapse=' '),
> ')\n',sep='')
> invisible(return(out))
> }
>
> # looking at phenotypic data associated with the sample expressionSet
>> pData(expressionSet)
> sex type score
> A Female Control 0.75
> B Male Case 0.40
> C Male Control 0.73
> D Male Case 0.42
> E Female Case 0.93
> F Male Control 0.22
> G Male Case 0.96
> H Male Case 0.79
> I Female Case 0.37
> J Male Control 0.63
> K Male Case 0.26
> L Female Control 0.36
> M Male Case 0.41
> N Male Case 0.80
> O Female Case 0.10
> P Female Control 0.41
> Q Female Case 0.16
> R Male Control 0.72
> S Male Case 0.17
> T Female Case 0.74
> U Male Control 0.35
> V Female Control 0.77
> W Male Control 0.27
> X Male Control 0.98
> Y Female Case 0.94
> Z Female Case 0.32
>
>
> # now the sample use
>> (subset1 =subset(expressionSet,sex="Male",type="Control"))
> 7 columns selected (C F J R U W X)
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 500 features, 7 samples
> element names: exprs, se.exprs
> phenoData
> sampleNames: C, F, ..., X (7 total)
> varLabels and varMetadata description:
> sex: Female/Male
> type: Case/Control
> score: Testing Score
> featureData
> featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total)
> fvarLabels and fvarMetadata description: none
> experimentData: use 'experimentData(object)'
> Annotation: hgu95av2
>
>
> # what I would like to allow in use:
> (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the ==
> instead of =
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioconductor
mailing list