[BioC] GEOquery - was queryGEO fails on GDS files (GEO Datasets)
Peter
bioconductor-mailinglist at maubp.freeserve.co.uk
Wed Jan 11 20:29:12 CET 2006
Sean Davis wrote:
>Peter,
>
>I have recently uploaded a new package to bioconductor called GEOquery.
I've had a little play - very nice work. Cheers. Just a few
queries/questions for you...
I never did work out how to load the package from the source files, but
I noticed there is now a Windows binary package on the website...
http://www.bioconductor.org/packages/bioc/1.8/html/GEOquery.html
I downloaded the ZIP file and installed it on Windows XP with R 2.1.1
and got the following warning:
package 'GEOquery' successfully unpacked and MD5 sums checked
updating HTML package descriptions
Warning message:
no package 'file15658' was found in: packageDescription(i, fields =
"Title", lib.loc = lib)
Question One
------------
Is the above "no package" warning important?
-------------------------------------------------------------------
Question Two
------------
> library(GEOquery)
Warning message:
package 'GEOquery' was built under R version 2.3.0
Does the version of R matter? I assume R version 2.3.0 is the
development version of R, as 2.2.1 is the latest official release.
-------------------------------------------------------------------
Question Three
--------------
> gds37 <- getGEO('GDS37', destdir="c:/temp/geo")
trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/GDS37.soft.gz'
ftp data connection made, file length 132384 bytes
opened URL
downloaded 129Kb
File stored at:
c:/temp/geo/GDS37.soft.gz
c:/temp/geo/GDS37.soft.gz
parsing geodata
parsing subsets
ready to return
Why does it print the file location twice?
-------------------------------------------------------------------
Question Four
-------------
If I repeat the command getGEO, why does it re-download the file?
> gds37 <- getGEO('GDS37', destdir="c:/temp/geo")
I would personally have written the getGEO code to check in the
destination folder for the files GDS37.soft or GDS37.soft.gz and just
load the local copy if it existed.
I know I should use the following instead:
> gds37 <- getGEO(filename="c:/temp/geo/gds37.soft.gz")
-------------------------------------------------------------------
Question Five
-------------
I like how you have handled converting subset information into phenotype
data in GDS2eSet.
Have you considered also parsing the "description" to extract the
"Alternative Sample Name" and the "Sample Source"?
As far as I can tell, all the current NCBI GDS files use the same format
for the description lines:
"Value for SAMPLENAME: ALTNAME; src: SOURCE"
On the other hand, this is clearly not a "defined field" and is subject
to change. Maybe automatically parse the lines if and only if it
follows that format?
-------------------------------------------------------------------
Thanks again - GEOquery looks like it will be very handy...
Peter
More information about the Bioconductor
mailing list