[BioC] Creating an expression set

Martin Morgan mtmorgan at fhcrc.org
Thu Jul 10 01:48:57 CEST 2008


Hi Nathan -- some suggestions below...

Nathan Harmston wrote:
> Hi,
> 
> I've just started playing with Bioconductor: I m trying to figure out a
> problem I have with creating ExpressionSet from a text file containing
> normalised microarray data:
> 
> exp = as.matrix(read.table("rma.txt", header=TRUE, sep="\t", row.names=1,
> as.is=TRUE))
> pd  = read.table("groups.txt", row.names= 1 , header = TRUE, sep="\t")
> meta = data.frame(labelDescription = c("Age of patient"),
> row.names=c("age"))
> pheno = new("AnnotatedDataFrame", data=pd, varMetadata = meta)
> 
> works to here. but then it fails here:
> 
> my_set = new("ExpressionSet", exprs=exp, phenoData=pheno,
> annotation="hgu133plus2")
> 
> Error in validObject(.Object) :
>   invalid class "ExpressionSet" object: sampleNames differ between assayData
> and phenoData
> 
> however when I do this:
> 
> m = new("ExpressionSet",exprs=exp)
> phenoData(m)=  new("AnnotatedDataFrame", data=pd, varMetadata = meta)
> annotation(m) = "hgu133plus2"

I think here if you did

 > validObject(m)

you'd be told that sampleNames differ... Unfortunately, it's possible to 
create objects that are invalid.

> 
> it works. What is the reason why it doesnt work/what am I doing wrong? I see
> talk about se.expr but not sure how it all works together?

The error is that the matrix 'exp' has column names, and the 'pd' has 
row names, and they are different, probably you can see this with

 > colnames(exp)
 > rownames(pd)

The row names of pd and the colnames of exp are meant to refer to the 
same thing (the names of the sample) and new("ExpressionSet") is having 
a tough time figuring out which names are intended. You might try, e.g.,

 > colnames(exp) <- rownames(pd)

(unless you like the row names of pd better!) And then create your 
ExpressionSet.

 > pheno = new("AnnotatedDataFrame", data=pd, varMetadata = meta)
 > my_set = new("ExpressionSet", exprs=exp, phenoData=pheno,
+  annotation="hgu133plus2")

Martin

> 
> Nathan
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list