[BioC] getGEO function to load files from other locations than GEO ?
Sean Davis
sdavis2 at mail.nih.gov
Fri Jun 1 22:51:18 CEST 2007
Wolfgang Raffelsberger wrote:
> Hi Sean,
>
> as you suggested her the output from readLines() :
> > readLines('GSM180487.txt',n=10)
> [1]
> "TYPE\ttext\ttext\ttext\ttext\tinteger\tfloat\tfloat\ttext\ttext\ttext\tinteger\
> [2]
> "FEPARAMS\tProtocol_Name\tProtocol_date\tScan_Date\tScan_ScannerName\tScan_NumCh
> [3] "DATA\t44k_CGH_0605 (Editable)\t30-Jan-2006 18:01\t06-09-2006
> 13:25:24\tAgilent
> [4] "*"
> [5]
> "TYPE\tfloat\tfloat\tfloat\tinteger\tfloat\tfloat\tfloat\tinteger\tfloat\tfloat\
> [6]
> "STATS\tgDarkOffsetAverage\tgDarkOffsetMedian\tgDarkOffsetStdDev\tgDarkOffsetNum
> [7]
> "DATA\t38.965\t39\t6.13591\t1000\t38.884\t39\t7.85039\t1000\t1.00937\t1.0098\t3\
> [8] "*"
> [9]
> "TYPE\tinteger\tinteger\tinteger\ttext\tinteger\ttext\tinteger\tinteger\ttext\tt
> [10]
> "FEATURES\tFeatureNum\tRow\tCol\taccessions\tSubTypeMask\tSubTypeName\tProbeUID\
>
> Most lines in the output above are very long (experiment meta-data), so
> I truncated since I believe you mainly want to see what kind of output I
> get...
> Indeed, it doesn't look at all like the output you descibed.
> Does this mean that when first downloading from GEO I get a different
> kind of format ?
>
Wolfgang,
I see where the confusion arises. GEO houses many formats of data in
their supplemental files. If you use getGEO to download from GEO, you
will always get the correct format for use by GEOquery. If you choose
to download the supplemental files, the format can be anything. Indeed,
you have downloaded an Agilent Feature Extraction file. There is not
any way to determine the format from the many possible formats available
for supplemental files from GEO. That is why SOFT format was created
and used by the GEO group.
> Amazingly the direct way of accessing directly at GEO (without
> downloading first & trying to acess the local copy) works without any
> difficulty...
>
>
That makes sense. Note the .soft extension when you use getGEO(), which
is different than .txt that you downloaded.
> In the meantime I've managed to read tha data using read.maimages() from
> limma, so there's no more urgency to find a solution on this issue.
> As I know too little about various GEO formats I'm afraid this may get
> too complicted... or I got across some bad example (here I'm not reading
> CGH data).
>
The file that you downloaded is not a GEO format, which is where the
confusion is arising. If you want to parse the supplemental files, then
you will need to determine the file type and the correct parser for it.
If you stick to the GEO soft format, then GEOquery will work just fine.
> The route via getGEO() might have been more elegant/flexible, though ...
>
As you note above, getGEO() works just fine. The confusion arises
because NCBI GEO also stores supplemental files, which, of course,
GEOquery cannot parse. A fully general parser for all microarray data
formats is well beyond the scope of GEOquery. I will think about how
best to modify the documentation to make this absolutely clear.
I hope that clarifies things a bit and sorry for the confusion.
Sean
More information about the Bioconductor
mailing list