[BioC] problem to obtain exprset using GEOquery

Sean Davis sdavis2 at mail.nih.gov
Thu Jul 13 22:37:36 CEST 2006


Stephen Henderson wrote:
> You can't. The information in the GSE object and the exprSet object
> contain probeset summary data i.e the summary of 11 probes into 1
> probeset value. rma uses all probes to create a probeset summary too.
> You need the CEL files and only some GEO entries supply these-- although
> occasionally the SOFT files contain rma data already.

Just to clarify, the algorithm for doing this would be something like:

1)  Use GEOquery to process the GSE file using getGEO followed by a 
little manipulation to obtain an exprSet.  This exprSet contains 
summarized data (and the summary method varies from GSE to GSE).

2)  From the GEO website, look to see if raw data is available.  If it 
is, it will often contain the .CEL files.  If that is the case, then 
download the files and process using RMA (or whatever method you like). 
  The .CEL files will be named with something like GSMXXXX.CEL.  The 
GSMXXXX part should typically match the GSMXXXX in the exprSet from the 
getGEO parsing.  So, if you are careful about the order, you can replace 
the exprs() slot of the first exprSet with the exprs() slot from the 
second exprSet.  Then, you will have all the phenodata from the GSE, but 
with the expression values from your own analyses.

Sean



More information about the Bioconductor mailing list