[BioC] queryGEO fails on GDS files (GEO Datasets)
Peter
bioconductor-mailinglist at maubp.freeserve.co.uk
Wed Jan 4 16:27:23 CET 2006
This follows on from a question from Saurin D. Jani, on the list a year ago:
https://stat.ethz.ch/pipermail/bioconductor/2005-January/007405.html
A working example:
library(AnnBuilder)
geo <- GEO()
queryGEO(geo,"GSM107")
This downloads and parses:-
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM107&targ=self&form=text&view=data
This fails for GEO Datasets (GDS files) like GDS813 (Saurin's example)
because the URL isn't accepted - the NCBI returns an HTML page which
redirects you to:
http://www.ncbi.nlm.nih.gov/projects/geo/gds/gds_browse.cgi?gds=813
This page in turn can be used (by a human, a little more tricky in code)
to download the actual GDS file - but only in compressed form:
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/GDS813.soft.gz
What this means is that at the moment, queryGEO doesn't support GDS
files. Even if it did, they are generally large and only available in
compressed format, making things generally more complicated.
Would it make more sense to provide to separate functions:
Firstly, to download the file (dealing with all possible URLs) and if
need be decompress it.
Secondly, to parse a GEO file from the provided handle/filename/url
This makes sense for other large GEO files like the GPL annotation
files, as well as the GEO datasets (GDS files). It seems wasteful and
slow to download them fresh each time.
Peter
More information about the Bioconductor
mailing list