[BioC] queryGEO fails on GDS files (GEO Datasets)
Sean Davis
sdavis2 at mail.nih.gov
Wed Jan 4 16:50:50 CET 2006
Peter,
I have recently uploaded a new package to bioconductor called GEOquery. It
is available as a development package
(http://www.bioconductor.org/packages/bioc/1.8/html/GEOquery.html), but it
doesn't depend on much, so should work with recent R and bioconductor
releases. It is capable of downloading and parsing GDS, GSM, GPL, and GSE.
(GSE download and parsing seems to be broken on windows, at least for some
GSEs--working on that). After installing, you could do:
> library(GEOquery)
# the following takes about a minute or so....
> gds813 <- getGEO('GDS813')
And then to convert to an exprSet, simply do:
> eset <- GDS2eSet(GDS,do.log2=TRUE)
> eset
Expression Set (exprSet) with
22690 genes
20 samples
phenoData object with 4 variables and 38 cases
varLabels
: sample
: disease.state
: tissue
: description
Sean
On 1/4/06 10:27 AM, "Peter" <bioconductor-mailinglist at maubp.freeserve.co.uk>
wrote:
> This follows on from a question from Saurin D. Jani, on the list a year ago:
>
> https://stat.ethz.ch/pipermail/bioconductor/2005-January/007405.html
>
> A working example:
>
> library(AnnBuilder)
> geo <- GEO()
> queryGEO(geo,"GSM107")
>
> This downloads and parses:-
>
> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM107&targ=self&form=text&v
> iew=data
>
> This fails for GEO Datasets (GDS files) like GDS813 (Saurin's example)
> because the URL isn't accepted - the NCBI returns an HTML page which
> redirects you to:
>
> http://www.ncbi.nlm.nih.gov/projects/geo/gds/gds_browse.cgi?gds=813
>
> This page in turn can be used (by a human, a little more tricky in code)
> to download the actual GDS file - but only in compressed form:
>
> ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/GDS813.soft.gz
>
> What this means is that at the moment, queryGEO doesn't support GDS
> files. Even if it did, they are generally large and only available in
> compressed format, making things generally more complicated.
>
> Would it make more sense to provide to separate functions:
>
> Firstly, to download the file (dealing with all possible URLs) and if
> need be decompress it.
>
> Secondly, to parse a GEO file from the provided handle/filename/url
>
> This makes sense for other large GEO files like the GPL annotation
> files, as well as the GEO datasets (GDS files). It seems wasteful and
> slow to download them fresh each time.
>
> Peter
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
More information about the Bioconductor
mailing list