[Rd] download.file() on ftp URL fails in windows with default download method
Dan Tenenbaum
dtenenba at fredhutch.org
Wed Aug 12 22:34:43 CEST 2015
Hi David,
----- Original Message -----
> From: "David Smith" <davidsmi at microsoft.com>
> To: "Dan Tenenbaum" <dtenenba at fredhutch.org>, "Uwe Ligges" <ligges at statistik.tu-dortmund.de>, "Elliot Waingold"
> <Elliot.Waingold at microsoft.com>
> Cc: "R-devel at r-project.org" <r-devel at r-project.org>
> Sent: Wednesday, August 12, 2015 12:42:39 PM
> Subject: RE: [Rd] download.file() on ftp URL fails in windows with default download method
>
> We were also able to reproduce the issue on Windows Server 2012. If
> there's anything we can do to help please let me know; Elliot
> Waingold (CC'd here) can provide access to the VM we used for
> testing if that's of any help.
>
Thanks!
I have just been looking at this issue with Martin Morgan. We found that if we "or" the additional flag INTERNET_FLAG_PASSIVE on line 1012 of src/modules/internet/internet.c (R-3.2 branch, last changed in r68393)
that the ftp connection works.
Further investigation reveals that in a passive ftp connection, certain ports on the client need to be open.
This machine is in the Amazon cloud so it was easy to open the ports. But we still have a problem and I believe it's that the wrong IP address is being sent to the server (on an AWS machine, the machine thinks of itself as having one IP address, but that is a private address that is valid inside AWS only).
Here's a curl command line that gets around this by sending the correct address (or hostname):
curl --ftp-port myhostname.com ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_000001405.13.assembly.txt
Curl normally uses passive mode which is why it works, but the --ftp-port switch tells it to use active mode with the specified ip address or hostname.
So I'm not sure where we go from here. One easy fix is just to add the INTERNET_FLAG_PASSIVE flag as described above. Another would be to first check if active mode works, and if not, use passive mode.
Dan
> # David Smith
>
> --
> David M Smith <davidsmi at microsoft.com>
> R Community Lead, Revolution Analytics (a Microsoft company)
> Tel: +1 (312) 9205766 (Chicago IL, USA)
> Twitter: @revodavid | Blog: http://blog.revolutionanalytics.com
> We are hiring engineers for Revolution R and Azure Machine Learning.
>
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Dan
> Tenenbaum
> Sent: Tuesday, August 11, 2015 09:51
> To: Uwe Ligges <ligges at statistik.tu-dortmund.de>
> Cc: R-devel at r-project.org
> Subject: Re: [Rd] download.file() on ftp URL fails in windows with
> default download method
>
>
>
> ----- Original Message -----
> > From: "Dan Tenenbaum" <dtenenba at fredhutch.org>
> > To: "Uwe Ligges" <ligges at statistik.tu-dortmund.de>
> > Cc: "R-devel at r-project.org" <r-devel at r-project.org>
> > Sent: Saturday, August 8, 2015 4:02:54 PM
> > Subject: Re: [Rd] download.file() on ftp URL fails in windows with
> > default download method
> >
> >
> >
> > ----- Original Message -----
> > > From: "Uwe Ligges" <ligges at statistik.tu-dortmund.de>
> > > To: "Dan Tenenbaum" <dtenenba at fredhutch.org>,
> > > "R-devel at r-project.org" <r-devel at r-project.org>
> > > Sent: Saturday, August 8, 2015 3:57:34 PM
> > > Subject: Re: [Rd] download.file() on ftp URL fails in windows
> > > with
> > > default download method
> > >
> > >
> > >
> > > On 08.08.2015 01:11, Dan Tenenbaum wrote:
> > > > Hi,
> > > >
> > > >> url <-
> > > >> "ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_000001405.13.assembly.txt"
> > > >> download.file(url, tempfile())
> > > > trying URL
> > > > 'ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_000001405.13.assembly.txt'
> > > > Error in download.file(url, tempfile()) :
> > > > cannot open URL
> > > > 'ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_000001405.13.assembly.txt'
> > > > In addition: Warning message:
> > > > In download.file(url, tempfile()) : InternetOpenUrl failed: ''
> > > >
> > > > If I set method="curl" it works fine. This was on R-3.2.2-beta
> > > > (sessionInfo() below) but I got the same results in R-3.2.1 and
> > > > R-devel.
> > > >
> > > > This does not happen on Windows Server 2008 but it happens on
> > > > Windows Server 2012.
> > >
> > >
> > > Thanks for letting us know. The kot recent machine I checked with
> > > is
> > > Windows Server 2008 R2 and I have not got problems on those. Can
> > > someone else rerpoduce this on Windows Server 2012?
> > >
> >
> > If you like I can give you temporary access (via remote desktop) to
> > a
> > machine in the Amazon cloud.
> > You can also download a Vagrant box here:
> >
> > https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fatlas
> > .hashicorp.com%2fboxes%2fsearch%3futf8%3d%25E2%259C%2593%26sort%3d%26p
> > rovider%3d%26q%3dwindows%2bserver%2b2012&data=01%7c01%7cdavidsmi%40mic
> > rosoft.com%7ce6746faa79b6426c81a508d2a26d3d35%7c72f988bf86f141af91ab2d
> > 7cd011db47%7c1&sdata=Z5pE32RJ7wEs4UBfRxXSDEqG6ESxFSFmHdFCU78kuaA%3d
> >
>
> Just wanted to check in about this to see whether anyone else has
> been able to reproduce this, or if Uwe has, or if anyone needs help
> setting up a test environment either in the cloud or by using a VM
> (like with Vagrant). I would be more than happy to help. I can set
> up a temporary instance in the cloud that interested parties could
> access at no cost.
>
> This issue looks like a showstopper for Bioconductor; we are in the
> process of moving our build system, and we were upgrading from
> Windows Server 2008 to Windows Server 2012 in the process, but this
> issue is going to affect a lot of packages if it is not resolved.
>
> What I can say is that it does not seem like a firewall issue, as the
> download works fine if I specify method="curl" (or libcurl) or
> paste the url into a browser, and I get the same results whether
> Windows Firewall is on or off.
>
> My naive guess is that the InternetOpenUrl API has changed in between
> Windows Server 2008 and Windows Server 2012.
>
> The offending call to this API seems to be at
> src/modules/internet/internet.c:#908 (in the R-3.2 branch; I did try
> R-devel as of r68987 and it still has this problem).
>
> I am really hoping something can be done about this before the
> release of R-3.2.2.
>
> Thanks!
> Dan
>
>
>
> > Dan
> >
> >
> >
> > > Best,
> > > Uwe Ligges
> > >
> > > >
> > > > Dan
> > > >
> > > >> sessionInfo()
> > > > R version 3.2.2 beta (2015-08-05 r68859)
> > > > Platform: x86_64-w64-mingw32/x64 (64-bit) Running under:
> > > > Windows
> > > > Server 2012 x64 (build 9200)
> > > >
> > > > locale:
> > > > [1] LC_COLLATE=English_United States.1252 [2]
> > > > LC_CTYPE=English_United States.1252 [3]
> > > > LC_MONETARY=English_United
> > > > States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United
> > > > States.1252
> > > >
> > > > attached base packages:
> > > > [1] stats graphics grDevices utils datasets methods
> > > > base
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fs
> > > > tat.ethz.ch%2fmailman%2flistinfo%2fr-devel&data=01%7c01%7cdavidsmi
> > > > %40microsoft.com%7ce6746faa79b6426c81a508d2a26d3d35%7c72f988bf86f1
> > > > 41af91ab2d7cd011db47%7c1&sdata=Xz86iq9HlmZoU5gRNgeGLx7hwoCSQVuBy9q
> > > > bHPNebz8%3d
> > > >
> > >
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fstat.
> > ethz.ch%2fmailman%2flistinfo%2fr-devel&data=01%7c01%7cdavidsmi%40micro
> > soft.com%7ce6746faa79b6426c81a508d2a26d3d35%7c72f988bf86f141af91ab2d7c
> > d011db47%7c1&sdata=Xz86iq9HlmZoU5gRNgeGLx7hwoCSQVuBy9qbHPNebz8%3d
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fstat.ethz.ch%2fmailman%2flistinfo%2fr-devel&data=01%7c01%7cdavidsmi%40microsoft.com%7ce6746faa79b6426c81a508d2a26d3d35%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Xz86iq9HlmZoU5gRNgeGLx7hwoCSQVuBy9qbHPNebz8%3d
>
More information about the R-devel
mailing list