[BioC] GEOquery - was queryGEO fails on GDS files (GEO Datasets)
Ting-Yuan Liu
tliu at fhcrc.org
Thu Jan 12 19:39:52 CET 2006
Hi Peter,
For Question 2: this is because GEOquery is not in the BioC 1.7 release.
Now it is in the BioC devel (1.8) repository, and it will be built by the
R devel (2.3) version.
I think you can ignore the warning message at this stage. If you really
concern about this, you can install the R devel version on your XP machine
and then run GEOquery on it. We recommend to install BioC devel packages
on the R devel version, and BioC stable packages on the R stable version.
HTH,
Ting-Yuan
______________________________________
Ting-Yuan Liu
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
Seattle, WA, USA
______________________________________
On Wed, 11 Jan 2006, Peter wrote:
> Sean Davis wrote:
> >Peter,
> >
> >I have recently uploaded a new package to bioconductor called GEOquery.
>
> I've had a little play - very nice work. Cheers. Just a few
> queries/questions for you...
>
> I never did work out how to load the package from the source files, but
> I noticed there is now a Windows binary package on the website...
>
> http://www.bioconductor.org/packages/bioc/1.8/html/GEOquery.html
>
> I downloaded the ZIP file and installed it on Windows XP with R 2.1.1
> and got the following warning:
>
> package 'GEOquery' successfully unpacked and MD5 sums checked
> updating HTML package descriptions
> Warning message:
> no package 'file15658' was found in: packageDescription(i, fields =
> "Title", lib.loc = lib)
>
> Question One
> ------------
> Is the above "no package" warning important?
>
> -------------------------------------------------------------------
>
> Question Two
> ------------
>
> > library(GEOquery)
> Warning message:
> package 'GEOquery' was built under R version 2.3.0
>
> Does the version of R matter? I assume R version 2.3.0 is the
> development version of R, as 2.2.1 is the latest official release.
>
> -------------------------------------------------------------------
>
> Question Three
> --------------
>
> > gds37 <- getGEO('GDS37', destdir="c:/temp/geo")
> trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/GDS37.soft.gz'
> ftp data connection made, file length 132384 bytes
> opened URL
> downloaded 129Kb
>
> File stored at:
> c:/temp/geo/GDS37.soft.gz
> c:/temp/geo/GDS37.soft.gz
> parsing geodata
> parsing subsets
> ready to return
>
> Why does it print the file location twice?
>
> -------------------------------------------------------------------
>
> Question Four
> -------------
> If I repeat the command getGEO, why does it re-download the file?
>
> > gds37 <- getGEO('GDS37', destdir="c:/temp/geo")
>
> I would personally have written the getGEO code to check in the
> destination folder for the files GDS37.soft or GDS37.soft.gz and just
> load the local copy if it existed.
>
> I know I should use the following instead:
>
> > gds37 <- getGEO(filename="c:/temp/geo/gds37.soft.gz")
>
>
> -------------------------------------------------------------------
>
> Question Five
> -------------
> I like how you have handled converting subset information into phenotype
> data in GDS2eSet.
>
> Have you considered also parsing the "description" to extract the
> "Alternative Sample Name" and the "Sample Source"?
>
> As far as I can tell, all the current NCBI GDS files use the same format
> for the description lines:
>
> "Value for SAMPLENAME: ALTNAME; src: SOURCE"
>
> On the other hand, this is clearly not a "defined field" and is subject
> to change. Maybe automatically parse the lines if and only if it
> follows that format?
>
> -------------------------------------------------------------------
>
> Thanks again - GEOquery looks like it will be very handy...
>
> Peter
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list