[R] File Downloading Problem

Duncan Temple Lang duncan at wald.ucdavis.edu
Mon Nov 1 18:03:58 CET 2010


I got this working almost immediately with RCurl although with that
one has to specify any value for the useragent option, or the same error occurs.

The issue is that R does not add an Accept entry to the HTTP request header.
It should add something like
   Accept: *.*

Using RCurl,
 u = "http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bhav.csv.zip"
 o = getURLContent(u, verbose = TRUE, useragent = getOption("HTTPUserAgent"))

succeeds (but not if there is no useragent).


We could fix R's download.file() to send Accept: *.*,
or allow general headers to be specified either as an option for
all requests, or as a parameter of download.file() (or both).
Or we could have the makeUserAgent() function in utils be more customizable
through options, or allow the R user specify the function herself.
But while this would be good, the HTTP facilities in R are not
intended to be as general something like libcurl (and hence RCurl).

Unless there is a compelling reason to enhance R's internal facilities,
I suggest people use something like libcurl.  This approach also has
the advantage of having the data directly in memory and avoiding writing
it to disk and then reading it back in, e.g.

  library(Rcompression)
  z = zipArchive(o)
  names(z)
  read.csv(textConnection(z[[1]]))


  D.


On 11/1/10 8:27 AM, Santosh Srinivas wrote:
> It's strange and the internet connection is fine because I am able to get
> data from yahoo.
> This was working till just yesterday ... strange if the website is creating
> issues with public access of basic data!
> 
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net] 
> Sent: 01 November 2010 20:48
> To: Duncan Murdoch
> Cc: Santosh Srinivas; 'Rhelp'
> Subject: Re: [R] File Downloading Problem
> 
> 
> On Nov 1, 2010, at 10:41 AM, Duncan Murdoch wrote:
> 
>> On 01/11/2010 10:37 AM, Santosh Srinivas wrote:
>>> Nope Duncan ... no changes .. the same old way without a proxy ...  
>>> actually
>>> the download.file is being returned "403 forbidden" which is strange.
>>>
>>> These are just two lines that I am trying to run.
>>>
>>> sURL<-
>>>
> "http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bha
>>> v.csv.zip"
>>> download.file(sURL,"test.zip")
>>>
>>> Put the same URL in a browser and it works fine.
>>
>> It doesn't work for me, so presumably there is some kind of security  
>> setting at the site (a cookie?), which allows your browser, but  
>> doesn't allow you to use R, or me to use anything.
> 
> Firefox in a Mac platform will download and unzip the file with no  
> security complaints and no cookie appears to be set when downloading,  
> but that code will not access the file, nor will my efforts to wrap  
> the URL in url() or unz() so it seems more likely that Santosh and I  
> do not understand the file opening processes that R supports.
> 
>  > con=
> unz(description="http://www.nseindia.com/content/historical/EQUITIES/2010/NO
> V/cm01NOV2010bhav.csv.zip 
> ", file="~/cm01NOV2010bhav.csv")
>  > test.df <-  read.csv(file=con)
> Error in open.connection(file, "rt") : cannot open the connection
> In addition: Warning message:
> In open.connection(file, "rt") :
>    cannot open zip file
> 'http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bha
> v.csv.zip'
> 
> 
>



More information about the R-help mailing list