[BioC] Biobase ExpressionSet: metadata on assayData

Thu Dec 20 19:52:21 CET 2007

Hi Eric -- Glad to be of help.

I did not see your 'ps' in the original message; perahps the following
has been clarified, but in case not...

Are you sure you are using AnnotatedDataFrame as it is intended? As a
concrete example:

> library(Biobase)
> data(sample.ExpressionSet)
> dim(sample.ExpressionSet)
Features  Samples 
     500       26 
> phenoData(sample.ExpressionSet)
An object of class "AnnotatedDataFrame"
  sampleNames: A, B, ..., Z  (26 total)
  varLabels and varMetadata description:
    sex: Female/Male
    type: Case/Control
    score: Testing Score
> pData(sample.ExpressionSet)
     sex    type score
A Female Control  0.75
B   Male    Case  0.40
C   Male Control  0.73
D   Male    Case  0.42
E Female    Case  0.93
F   Male Control  0.22
G   Male    Case  0.96
H   Male    Case  0.79
I Female    Case  0.37
J   Male Control  0.63
K   Male    Case  0.26
L Female Control  0.36
M   Male    Case  0.41
N   Male    Case  0.80
O Female    Case  0.10
P Female Control  0.41
Q Female    Case  0.16
R   Male Control  0.72
S   Male    Case  0.17
T Female    Case  0.74
U   Male Control  0.35
V Female Control  0.77
W   Male Control  0.27
X   Male Control  0.98
Y Female    Case  0.94
Z Female    Case  0.32
> varMetadata(sample.ExpressionSet)
      labelDescription
sex        Female/Male
type      Case/Control
score    Testing Score

pData returns the data.frame describing phenotypes, varMetadata
returns the meta-data describing the columns of pData. It's not clear
from your example below what variables v1 through v5 are meant to
represent, but your 'meta-data' 

>> treatment=c("D","192","233","192","233")
>> control=c(1,0,0,0,0)
>> dose=c(NA,30,10,10,0.3)
>> replicate=rep(1,5)

seems really to be meant (when appropriately rearranged) as components
of pData.

pData and varMetadata are defined to make access to the underyling
phenotypic data easy. The actual structure of the object is more
accurately represented by the function calls

> adf <- phenoData(sample.ExpressionSet) # phenotype AnnotatedDataFrame
> df <- pData(adf) # 'data' part of AnnotatedDataFrame
> md <- varMetadata(adf) # meta-data, of AnnotatedDataFrame

>From your function below, it looks like you're trying to select
columns of pData based on their varMetadata. I'm not sure whether
there is a strong use case of this, but here's a little example

> pData <- data.frame(X=1:5, Y=5:1, Z=letters[1:5])
> varMetadata <- data.frame(
+     labelDescription=c(
+       "X description", "Y description", "Z description"),
+     metaA=c(TRUE,TRUE,FALSE),
+     metaB=c(TRUE,FALSE,TRUE),
+     metaC=c("yes", "no", "no"))
> adf <- new("AnnotatedDataFrame",
+            data=pData, varMetadata=varMetadata)

For interactive use, I'd probably do something like

> idx <- with(varMetadata(adf), metaA & metaB)
> adf[,idx]
An object of class "AnnotatedDataFrame"
  rowNames: 1, 2, ..., 5  (5 total)
  varLabels and varMetadata description:
    X: X description
  additional varMetadata: metaA, metaB, metaC

(this could be written in a single line, e.g.,

adf[,varMetadata(adf)$metaA & varMetadata(adf)$megaB]

but such brevity both is less efficient and more confusing).

'with' is providing an easy way to access the variables in
varMetadata(adf). The second argument to 'with' can be a series of
statements of the form,

with(varMetadata(obj), { <your statements here...> })

Your goal seems to be to create complex selection criteria. For this
case I find it very useful to stick to the paradigm of constructing
logical vectors and using the vectorized logical operators &, | and t

If you really wanted to make this kind of operation into a function
call, I might

> adfMetaSelect <- function(adf, ..., how=all) {
+     dots <- match.call(expand.dots=FALSE)[["..."]]
+     res <- lapply(dots,
+                   function(elt, vm) with(vm, eval(elt)),
+                   vm=varMetadata(adf))
+     idx <- do.call(mapply, c(how, res))
+     adf[,idx]
+ }
> adfMetaSelect(adf, metaA, metaB)
An object of class "AnnotatedDataFrame"
  rowNames: 1, 2, ..., 5  (5 total)
  varLabels and varMetadata description:
    X: X description
  additional varMetadata: metaA, metaB, metaC
> adfMetaSelect(adf, metaA, !metaB)
An object of class "AnnotatedDataFrame"
  rowNames: 1, 2, ..., 5  (5 total)
  varLabels and varMetadata description:
    Y: Y description
  additional varMetadata: metaA, metaB, metaC
> adfMetaSelect(adf, metaC=="no")
An object of class "AnnotatedDataFrame"
  rowNames: 1, 2, ..., 5  (5 total)
  varLabels and varMetadata description:
    Y: Y description
    Z: Z description
  additional varMetadata: metaA, metaB, metaC

The 'how' argument specifies how the logical conditions provided in
... will be combined, in this case all conditions must be
true. Probably the mapply could be replaced with how if 'how' were,
e.g., get("&").

Perhaps this provides you with some ideas.

Martin

"Eric Lecoutre" <ericlecoutre at gmail.com> writes:

> Hi Martin;
>
> With a little retard, thank for your detailed answer.
> I did some time to go on with my investigations and now things are more
> clear on what I should do with all those data (and mostly that I have to use
> phenotypic slot for my data on cell lines).
> There are nearly 100 cell lines used by my client, thus it is really worth
> using ExpressionSet structure for further analysis.
>
> Best wishes,
>
> Eric
>
>
>
> 2007/12/14, Eric Lecoutre <ericlecoutre at gmail.com>:
>>
>> Hi,
>>
>> I am new to Bioconductor and am studying both biobase and biostatistics
>> for a small project.
>> My client wants to know wether he should use ExpressionSet for part of its
>> assay R&D process.
>> For a experiment, I understand there is a lot of common metadata like
>> compound, dose level, replicate,...
>> I have seen phylo and feature dataframe class AnnotatedDataFrame and
>> already said to the client he could use that.
>> Fact is that those metadata (if I have weell understand) also could be
>> used for gene expression (so addayData).
>> What is the standard BioConductor way to handle those metadata? : there is
>> no metadata argument associated to assayData.
>> Should I use an AnnotatedDataFrame for feature repeting gene expression
>> with such metadata?
>>
>> btw, are there people here who really use ExpressionSet in their
>> processes?
>>
>> Thanks for any insight.
>>
>>
>> Eric
>>
>>
>> PS: as I looked at AnnotatedDataFrame class, I missed a helper function to
>> exploit metadata.
>> Here is such a little function and a sample use, where one requests for
>> variables in AnnotatedDataFrame with conditions on metadata (arbitrary ones,
>> handled by dots ...)
>>
>>
>>
>>
>> selectVariables <- function(x,logic=all,drop=FALSE,...){
>>   listCriteria <- list(...)
>>   metadata <- varMetadata(x)
>>   retainedCriteria <- list()
>>   sapply(names(listCriteria), function(critname) {
>>     if(!critname %in% colnames(metadata)){
>>       cat("\n Dropped criteria:",critname, "not in AnnotatedDataFrame\n")
>>     }else{
>>       if(is.null(listCriteria[critname])) listCriteria[[critname]]<-
>> unique(metadata[,critname])
>>        retainedCriteria[[critname]] <<-  metadata[,critname] %in%
>>         listCriteria[critname]
>>     }
>>     })
>>    criteriaValues <- do.call("cbind",retainedCriteria)
>>    selectedColumns <<- apply(criteriaValues,1,logic)
>>    cat('\n',sum(selectedColumns),' columns selected.\n',sep='')
>>    return(selectedColumns)
>> }
>>
>>
>>
>>
>> library(Biobase)
>> # prepating metadata
>> treatment=c("D","192","233","192","233")
>> control=c(1,0,0,0,0)
>> dose=c(NA,30,10,10,0.3)
>> replicate=rep(1,5)
>> metadata <- data.frame
>> (cbind(treatment=treatment,control=control,dose=dose,replicate=replicate,
>>   labelDescription=paste("treatment: ",treatment, ifelse(control==1, "
>> [control]","")," dose:",dose,"(",replicate,")",sep='')))
>>
>>   data1=data.frame(cbind(v1=1:2,v2=2:3,v3=3:4,v4=4:5,v5=5:6))
>> anData1 = new("AnnotatedDataFrame",data=data1,varMetadata=metadata)
>>
>>
>> # use little function to create an subset data.frame
>>
>> anData1[,selectVariables(anData1,dose=10, dummy=0)]
>>
>>
>>
>>
>>
>> --
>> Eric Lecoutre
>> Consultant - Business & Decision
>> Business Intelligence & Customer Intelligence
>>
>
>
>
> -- 
> Eric Lecoutre
> Consultant - Business & Decision
> Business Intelligence & Customer Intelligence
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793