[R] Problems with getURL (RCurl) to obtain list files of an ftp directory

Duncan Temple Lang duncan at wald.ucdavis.edu
Fri Oct 12 18:41:39 CEST 2012


Hi Francisco

  The code gives me the correct results, and it works for you on a Windows machine.
So while it could be different versions of software (e.g. libcurl, RCurl, etc.),
the presence of the word "squid" in the HTML suggests to me that
your machine/network is using the proxy/caching software Squid. This intercepts
requests and caches the results locally and shares them across
local users.  So if squid has retrieved that page for an HTML target (e.g. a browser or
with a Content-Type set to text/html), it may be using that cached copy for your FTP request.

One thing I like to do when debugging RCurl calls is to add
  verbose = TRUE
to the .opts argument and then see the information about the communication.

   D.

On 10/11/12 11:37 AM, Francisco Zambrano wrote:
> Dear all,
> 
> I have a problem with the command 'getURL' from the RCurl package, which I
> have been using to obtain a ftp directory list from the MOD16 (ET, DSI)
> products, and then to  download them. (part of the script by Tomislav
> Hengl, spatial-analyst). Instead of the list of files (from ftp), I am
> getting the complete html code. Anyone knows why this might happen?
> 
> This are the steps i have been doing:
> 
>> MOD16A2.doy<- '
> ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/'
> 
>> items <- strsplit(getURL(MOD16A2.doy,
> .opts=curlOptions(ftplistonly=TRUE)), "\n")[[1]]
> 
>> items #results
> 
> [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"
> http://www.w3.org/TR/html4/loose.dtd\">\n<!-- HTML listing generated by
> Squid 2.7.STABLE9 -->\n<!-- Wed, 10 Oct 2012 13:43:53 GMT
> -->\n<HTML><HEAD><TITLE>\nFTP Directory:
> ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n</TITLE>\n<STYLE
> type=\"text/css\"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}--></STYLE>\n</HEAD><BODY>\n<H2>\nFTP
> Directory: <A HREF=\"/\">ftp://ftp.ntsg.umt.edu</A>/<A
> HREF=\"/pub/\">pub</A>/<A HREF=\"/pub/MODIS/\">MODIS</A>/<A
> HREF=\"/pub/MODIS/Mirror/\">Mirror</A>/<A
> HREF=\"/pub/MODIS/Mirror/MOD16/\">MOD16</A>/<A
> HREF=\"/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\">MOD16A2.105_MERRAGMAO</A>/</H2>\n<PRE>\n<A
> HREF=\"../\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\"
> ALT=\"[DIRUP]\"></A> <A HREF=\"../\">Parent Directory</A> \n<A
> HREF=\"GEOTIFF_0.05degree/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"GEOTIFF_0.05degree/\">GEOTIFF_0.05degree</A>
> . . . . . . . Jun  3 18:00        \n<A HREF=\"GEOTIFF_0.5degree/\"><IMG
> border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"GEOTIFF_0.5degree/\">GEOTIFF_0.5degree</A>. .
> . . . . . . Jun  3 18:01        \n<A HREF=\"Y2000/\"><IMG border=\"0\"
> SRC=\"http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2000/\">Y2000</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2001/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2001/\">Y2001</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2002/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2002/\">Y2002</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2003/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2003/\">Y2003</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2004/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2004/\">Y2004</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2005/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2005/\">Y2005</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2006/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2006/\">Y2006</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2007/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2007/\">Y2007</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2008/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2008/\">Y2008</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2009/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2009/\">Y2009</A>. . . . . . . . . . . . . .
> Dec 23  2010        \n<A HREF=\"Y2010/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2010/\">Y2010</A>. . . . . . . . . . . . . .
> Feb 20  2011        \n<A HREF=\"Y2011/\"><IMG border=\"0\" SRC=\"
> http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\"
> ALT=\"[DIR] \"></A> <A HREF=\"Y2011/\">Y2011</A>. . . . . . . . . . . . . .
> Mar 12  2012        \n</PRE>\n<HR noshade
> size=\"1px\">\n<ADDRESS>\nGenerated Wed, 10 Oct 2012 13:43:53 GMT by
> localhost (squid/2.7.STABLE9)\n</ADDRESS></BODY></HTML>\n"
> 
> The curious is that the command getURL was working well until I don't know
> what happened. And using the same command in Windows works fine.
> 
> The sessionInfo() have given me the next:
> 
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> LC_TIME=en_US.UTF-8
>  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
> LC_ADDRESS=C
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] MODIS_0.5-8     maptools_0.8-16 lattice_0.20-0  foreign_0.8-48
> date_1.2-32
>  [6] RCurl_1.95-0.1  bitops_1.0-4.1  rgdal_0.7-19    raster_2.0-12
> sp_0.9-99
> 
> loaded via a namespace (and not attached):
> [1] grid_2.14.1  tools_2.14.1
> 
> Kind regard for all
> 
> Francisco Zambrano Bigiarini
> INIA Quilamapu, Chillán, *Chile*
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list