[BioC] GEOquery
Cei Abreu-Goodger
cei at sanger.ac.uk
Fri Jul 11 17:57:40 CEST 2008
Ok, I just realized that the options can be passed quite easily:
getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/",
"ftp.use.epsv"=0)
But now, we return to the original issue, how do I use this parameter to
get geoGEO working, since it doesn't pass on extra parameters.
Let me re-state:
library(GEOquery)
g<-getGEO("GSE4201",GSEMatrix=TRUE)
Times out when no ftp_proxy is set (which could be solved if I was able
to disable the ftp.use.epsv option of RCurl):
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
couldn't connect to host
or if I use our proxy server, it gets trapped in HTML garbage:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
line 1 did not have 8 elements
Which apparently cannot be worked around, I've already asked our IT
department to see if they could change the proxy server settings.
Any suggestions?
Cei
> sessionInfo()
R version 2.7.0 (2008-04-22)
x86_64-unknown-linux-gnu
locale:
C
attached base packages:
[1] stats graphics grDevices datasets tools utils methods
[8] base
other attached packages:
[1] biomaRt_1.14.0 GEOquery_2.4.0 RCurl_0.9-3 Biobase_2.0.1
loaded via a namespace (and not attached):
[1] XML_1.95-2
Cei Abreu-Goodger wrote:
> Hi Sean,
>
> I'm trying to help Harpreet to get the GEOquery library working
> properly over here. Thanks to what you pointed out, we are able to
> track the problem down to curl using our http proxy, which for ftp
> transfers is not required. We still have one problem, that I can't
> figure how to turn off the "ftp.use.epsv" option in RCurl. So, on a
> linux terminal, I can use:
>
> curl --disable-epsv
> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/"
> -r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32
> GSE4201_series_matrix.txt.gz
>
> (without the --disable-epsv it times out unless I set the ftp_proxy,
> but then I get the HTML index instead of the file listing)
>
> inside R, I imagine I have to turn the "ftp.use.epsv" option off, and
> I've tried doing something like this:
>
> myCurl <- getCurlOptionsConstants()
> myCurl[["ftp.use.epsv"]] <- 0
> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/",
> .opts=list(myCurl))
>
> but it keeps timing out...
>
> I also tried:
>
> curlSetOpt("ftp.use.epsv"=0)
>
> but that doesn't seem to have any effect on what
> getCurlOptionsConstants() returns, it just creates a CURLOptions
> object, which I can't figure out how to use.
>
> Do you have any suggestions, or should I search for help directly with
> the RCurl developers?
>
> Many thanks,
>
> Cei
>> So, this appears to be the problem. It looks like your proxy is
>> intercepting the ftp directory listing and converting it to HTML. I
>> do not know how to solve this problem, as it appears to be a proxy
>> configuration issue at your institution. However, I can't say for
>> sure. The output of the getURL() command should look like:
>>
>>
>>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
>>>
>> [1] "-r--r--r-- 1 ftp anonymous 930471 Apr 13 05:32
>> GSE4201_series_matrix.txt.gz\n"
>>
>> Notice how yours is much longer and is HTML, not plain text.
>>
>> Sean
>>
>>
>>
>
>
--
Cei Abreu-Goodger, PhD
Wellcome Trust Sanger Institute
Computational and Functional Genomics
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list