[BioC] GEOquery

Sean Davis sdavis2 at mail.nih.gov
Fri Jul 11 19:54:07 CEST 2008


On Fri, Jul 11, 2008 at 11:57 AM, Cei Abreu-Goodger <cei at sanger.ac.uk> wrote:
> Ok, I just realized that the options can be passed quite easily:
> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/",
> "ftp.use.epsv"=0)
>
> But now, we return to the original issue, how do I use this parameter to get
> geoGEO working, since it doesn't pass on extra parameters.
>
> Let me re-state:
>
> library(GEOquery)
> g<-getGEO("GSE4201",GSEMatrix=TRUE)
>
> Times out when no ftp_proxy is set (which could be solved if I was able to
> disable the ftp.use.epsv option of RCurl):
>
> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
>  couldn't connect to host
>
>
> or if I use our proxy server, it gets trapped in HTML garbage:
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>  :
>  line 1 did not have 8 elements
>
>
> Which apparently cannot be worked around, I've already asked our IT
> department to see if they could change the proxy server settings.
>
> Any suggestions?

Thanks for all the work to get to this point.

I'll look into what the best changes would be for GEOquery.  The
original idea of the getGEO() function was to maximize simplicity, but
there are obviously issues that come up with passing arguments to
internal functions.

Sean

>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> x86_64-unknown-linux-gnu
>
> locale:
> C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  tools     utils     methods [8]
> base
> other attached packages:
> [1] biomaRt_1.14.0 GEOquery_2.4.0 RCurl_0.9-3    Biobase_2.0.1
>
> loaded via a namespace (and not attached):
> [1] XML_1.95-2
>
>
>
>
>
> Cei Abreu-Goodger wrote:
>>
>> Hi Sean,
>>
>> I'm trying to help Harpreet to get the GEOquery library working properly
>> over here. Thanks to what you pointed out, we are able to track the problem
>> down to curl using our http proxy, which for ftp transfers is not required.
>> We still have one problem, that I can't figure how to turn off the
>> "ftp.use.epsv" option in RCurl. So, on a linux terminal, I can use:
>>
>> curl --disable-epsv
>> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/"
>> -r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32
>> GSE4201_series_matrix.txt.gz
>>
>> (without the --disable-epsv it times out unless I set the ftp_proxy, but
>> then I get the HTML index instead of the file listing)
>>
>> inside R, I imagine I have to turn the "ftp.use.epsv" option off, and I've
>> tried doing something like this:
>>
>> myCurl <- getCurlOptionsConstants()
>> myCurl[["ftp.use.epsv"]] <- 0
>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/",
>> .opts=list(myCurl))
>>
>> but it keeps timing out...
>>
>> I also tried:
>>
>> curlSetOpt("ftp.use.epsv"=0)
>>
>> but that doesn't seem to have any effect on what getCurlOptionsConstants()
>> returns, it just creates a CURLOptions object, which I can't figure out how
>> to use.
>>
>> Do you have any suggestions, or should I search for help directly with the
>> RCurl developers?
>>
>> Many thanks,
>>
>> Cei
>>>
>>> So, this appears to be the problem.  It looks like your proxy is
>>> intercepting the ftp directory listing and converting it to HTML.  I
>>> do not know how to solve this problem, as it appears to be a proxy
>>> configuration issue at your institution.  However, I can't say for
>>> sure.  The output of the getURL() command should look like:
>>>
>>>
>>>>
>>>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
>>>>
>>>
>>> [1] "-r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32
>>> GSE4201_series_matrix.txt.gz\n"
>>>
>>> Notice how yours is much longer and is HTML, not plain text.
>>>
>>> Sean
>>>
>>>
>>>
>>
>>
>
>
> --
> Cei Abreu-Goodger, PhD
>
> Wellcome Trust Sanger Institute
> Computational and Functional Genomics
> Wellcome Trust Genome Campus
> Hinxton, Cambridge, CB10 1SA, UK
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
>



More information about the Bioconductor mailing list