[BioC] problem to obtain exprset using GEOquery
Sean Davis
sdavis2 at mail.nih.gov
Thu Jul 13 22:37:36 CEST 2006
Stephen Henderson wrote:
> You can't. The information in the GSE object and the exprSet object
> contain probeset summary data i.e the summary of 11 probes into 1
> probeset value. rma uses all probes to create a probeset summary too.
> You need the CEL files and only some GEO entries supply these-- although
> occasionally the SOFT files contain rma data already.
Just to clarify, the algorithm for doing this would be something like:
1) Use GEOquery to process the GSE file using getGEO followed by a
little manipulation to obtain an exprSet. This exprSet contains
summarized data (and the summary method varies from GSE to GSE).
2) From the GEO website, look to see if raw data is available. If it
is, it will often contain the .CEL files. If that is the case, then
download the files and process using RMA (or whatever method you like).
The .CEL files will be named with something like GSMXXXX.CEL. The
GSMXXXX part should typically match the GSMXXXX in the exprSet from the
getGEO parsing. So, if you are careful about the order, you can replace
the exprs() slot of the first exprSet with the exprs() slot from the
second exprSet. Then, you will have all the phenodata from the GSE, but
with the expression values from your own analyses.
Sean
More information about the Bioconductor
mailing list