[BioC] How to display or output expression data with non-unique feature labels?

Fri Jun 29 21:57:34 CEST 2007

Hi Masha --

What you'd like to do is bind the columns of the pData of featureData
to the expression data (and just show the first few lines):

> head(cbind(pData(featureData(data)), exprs(data)))
  name location          a          b           c          d           e
1   g1     pos1  1.6470129  0.8722101 -2.19475753 -0.4661532  1.01644889
2   g2     pos1 -1.1119872  1.0593626  1.28964663 -0.6109748 -0.41772721
3   g3     pos1 -1.5485456  0.4833932 -0.46199812  1.1731445 -0.09008735
4   g4     pos1 -0.7483785 -1.4618555  0.35941850 -0.8775726 -1.33541208
5   g5     pos1  1.5264111 -0.3953324  0.07780542  1.2670281  0.71261880
6   g1     pos1 -1.1379254  0.3625114  0.45684852  0.1956062  1.16690297

If you'd wanted to create the equivalent structure but with phenoData,
then as.data.frame(data) would have done this for you.

A couple of other comments...

> phenoData=new("AnnotatedDataFrame", data=pData, 
>   varMetadata=c("Gender","Age Group", "Location"))

I think varMetadata should be a data frame with a (required) column
called 'labelDescription', e.g,

> varMetadata=data.frame(labelDescription=
+   c("Gender","Age Group", "Location"))
> phenoData=new("AnnotatedDataFrame",
+  data=pData, varMetadata=varMetadata)

Depending of course on how involved you want to get, the 'annotation'
slot of ExpressionSet is meant to reference an annotation package that
contains information about features, e.g., maps between (unique) probe
names and (possibly repeating) gene names. This is easy for standard
chips, since the annotation packages already exist. More challenging
for custom chips, where you have to create the annotation package
yourself. Worth figuring out if you'll use the same chip design many
times -- the mapping just has to be done once, in creating the
annotation package.

Hope that helps,

Martin

"Kocherginsky, Masha [BSD] - HSD" <mkocherg at health.bsd.uchicago.edu> writes:

> Hello,
>
> I'm learning (slowly) the way eSet and ExpressionSet classes work.  What
> I want to do is probably quite simple, but I can't figure out the
> following: how do I either display (onscreen) or write into a file the
> expression levels of an ExpressionSet object AND include some or all
> featureData?  For example, in the ExpressionSet generated below I have
> g1-g5 replicated 4 times, and pos1-pos2 is replicated 10 times:
>
> set.seed(31415)
> exprs=data.frame(matrix(rnorm(100),nrow=20, ncol=5))
> colnames(exprs)=letters[1:5]
> pData=data.frame(matrix(rbinom(15,1,.5), nrow=5, byrow=T))
> rownames(pData)=letters[1:5]
> colnames(pData)=c("sex","agegrp","location")
> phenoData=new("AnnotatedDataFrame", data=pData, 
>   varMetadata=c("Gender","Age Group", "Location"))
>
> fNames=rep(c("g1","g2","g3","g4","g5"),4)
> fLoc=sort(rep(c("pos1","pos2"),10))
> fData=data.frame(name=fNames, location=fLoc)
> fDatalbl=data.frame(labelDescription=c("Name of Gene", "Location of
>
>   Gene"), row.names=colnames(fData))
> featureData=new("AnnotatedDataFrame", data=fData, varMetadata=fDatalbl)
> data=new("ExpressionSet", exprs=exprs, phenoData=phenoData, 
>   featureData=featureData)
>
>> data
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 20 features, 5 samples 
>   element names: exprs 
> phenoData
>   rowNames: a, b, ..., e (5 total)
>   varLabels and varMetadata:
>     sex: NA
>     agegrp: NA
>     location: NA
> featureData
>   rowNames: 1, 2, ..., 20 (20 total)
>   varLabels and varMetadata:
>     name: Name of Gene
>     location: Location of       
>       Gene
> experimentData: use 'experimentData(object)'
> Annotation character(0)
>
>
> I'd like to get something like this using some accessor function (is
> there one?):
>
>   fNames fLoc        a        b        c        d        e
> 1     g1 pos1  1.64701  0.87221 -2.19476 -0.46615  1.01645
> 2     g2 pos1 -1.11199  1.05936  1.28965 -0.61097 -0.41773
> 3     g3 pos1 -1.54855  0.48339 -0.46200  1.17314 -0.09009
> 4     g4 pos1 -0.74838 -1.46186  0.35942 -0.87757 -1.33541
> 5     g5 pos1  1.52641 -0.39533  0.07781  1.26703  0.71262
> 6     g1 pos1 -1.13793  0.36251  0.45685  0.19561  1.16690
>
> The data set I'm working with comes from a custom array with 3
> replicates for each gene, and thus the "feature" names are not unique.
> Am I setting up the ExpressionData object correctly? Is there a better
> way than to incorporate non-unique gene names in exprs than to use gene
> names via featureData? 
>
> Thanks for any advice!
> Masha
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Masha Kocherginsky, Ph.D.
> Research Associate (Assistant Professor)
> Department of Health Studies
> University of Chicago
> 5841 S. Maryland Ave.
> Chicago, IL  60637
>
>
> This email is intended only for the use of the individual or...{{dropped}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org