[R-SIG-Mac] Download data from Internet contained in a Zip file

David Winsemius dwinsemius at comcast.net
Mon Dec 26 18:40:20 CET 2016


> On Dec 26, 2016, at 3:19 AM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:
> 
> Hi David et al,
> 
> Thanks for showing the pointers. With your approach, I see the
> "temp.zip" file in my working folder.
> 
> However still I could not extract the data within it. I tried using
> unzip() function, however not really going through :
> 
>> unzip("temp.zip")
> Warning message:
> In unzip("temp.zip") : error 1 in extracting from zip file

I didn't try to use R to unzip it. Just using my system facilities worked fine.

I'm not able to reproduce:

> ?unzip
> dat <- unzip("~/temp.zip")
> str(dat)
 chr "./NAV_File_23122016.out"
> dat_in <- read.table(dat)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 17 elements
> dat_in <- read.csv(dat, header=FALSE)
> str(dat_in)
'data.frame':	75 obs. of  6 variables:
 $ V1: Factor w/ 1 level "12/23/2016": 1 1 1 1 1 1 1 1 1 1 ...
 $ V2: Factor w/ 7 levels "PFM001","PFM002",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ V3: Factor w/ 7 levels "HDFC PENSION MANAGEMENT COMPANY LIMITED",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ V4: Factor w/ 75 levels "SM001001","SM001002",..: 5 7 8 11 12 13 1 2 3 4 ...
 $ V5: Factor w/ 75 levels "HDFC PENSION MANAGEMENT COMPANY LIMITED SCHEME A - TIER I",..: 62 59 63 37 56 57 54 55 60 58 ...
 $ V6: num  21.7 21.1 20.8 11.7 10.1 ...


-- 
David.
> 
> When I try to access the link
> "https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip"
> manually, then download the zip file and then unzip it, I get a file
> called "NAV_File_23122016.out". Which next I open in excel and get all
> the data.
> 
> I was just trying to perform similar task, however through R, so that
> I can load data automatically directly from Web.
> 
> Any Idea please. I am using below version of R (I know this is quite
> old version, however I am not currently in a position to upgrade my
> Macbook)
> 
>> R.Version()
> $platform
> [1] "x86_64-apple-darwin10.8.0"
> 
> $arch
> [1] "x86_64"
> 
> $os
> [1] "darwin10.8.0"
> 
> $system
> [1] "x86_64, darwin10.8.0"
> 
> $status
> [1] ""
> 
> $major
> [1] "3"
> 
> $minor
> [1] "2.1"
> 
> $year
> [1] "2015"
> 
> $month
> [1] "06"
> 
> $day
> [1] "18"
> 
> $`svn rev`
> [1] "68531"
> 
> $language
> [1] "R"
> 
> $version.string
> [1] "R version 3.2.1 (2015-06-18)"
> 
> $nickname
> [1] "World-Famous Astronaut"
> 
> On Mon, Dec 26, 2016 at 7:18 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>>> On Dec 25, 2016, at 3:46 PM, Gábor Csárdi <csardi.gabor at gmail.com> wrote:
>>> 
>>> Your R build does not support HTTPS.
>>> 
>>> I suggest that you use the curl package if you can. HTTP support in
>>> base R is very limited currently.
>> 
>> I generally use the downloader package. It sets up the call to download.file so that it succeeds with https URLs.
>> 
>> 
>> install.packages("downloader", dependencies=TRUE)
>> trying URL 'http://cran.cnr.Berkeley.edu/bin/macosx/mavericks/contrib/3.3/downloader_0.4.tgz'
>> Content type 'application/x-gzip' length 19459 bytes (19 KB)
>> ==================================================
>> downloaded 19 KB
>> 
>> 
>> The downloaded binary packages are in
>>        /var/folders/68/vh2f8kzn09j8954r6q9100yh0000gn/T//Rtmpq8DVG4/downloaded_packages
>>> library(downloader)
>>> help(pac=downloader)
>> starting httpd help server ... done
>>> download("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip","temp.zip")
>> 
>> # Requires both a source and destination file name.
>> 
>> trying URL 'https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip'
>> Content type 'application/octet-stream' length 1228 bytes
>> ==================================================
>> downloaded 1228 bytes
>> 
>> --
>> David.
>>> 
>>> Gabor
>>> 
>>> 
>>> 
>>> On Sun, Dec 25, 2016 at 10:37 PM, Christofer Bogaso
>>> <bogaso.christofer at gmail.com> wrote:
>>>> Hi again,
>>>> 
>>>> I posted this in general R thread, however it is suggested this group
>>>> since I am using MAC OS 10.7.5.
>>>> 
>>>> I was following the instruction available in
>>>> "http://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data"
>>>> to download data from Internet contained in a zip file from the
>>>> address :
>>>> 
>>>> https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip
>>>> 
>>>> However when I tried to follow the instruction I am facing below error :
>>>> 
>>>>> temp <- tempfile()
>>>>> download.file("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip",temp)
>>>> Error in download.file("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip",
>>>> :
>>>> unsupported URL scheme
>>>> 
>>>> Can someone here please tell me what went wrong in above?
>>>> 
>>>> Highly appreciate your feedback.
>>>> 
>>>> Thanks for your time.
>>>> 
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>> 
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>> 



More information about the R-SIG-Mac mailing list