[R-SIG-Mac] Download data from Internet contained in a Zip file
David Winsemius
dwinsemius at comcast.net
Mon Dec 26 18:40:20 CET 2016
> On Dec 26, 2016, at 3:19 AM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:
>
> Hi David et al,
>
> Thanks for showing the pointers. With your approach, I see the
> "temp.zip" file in my working folder.
>
> However still I could not extract the data within it. I tried using
> unzip() function, however not really going through :
>
>> unzip("temp.zip")
> Warning message:
> In unzip("temp.zip") : error 1 in extracting from zip file
I didn't try to use R to unzip it. Just using my system facilities worked fine.
I'm not able to reproduce:
> ?unzip
> dat <- unzip("~/temp.zip")
> str(dat)
chr "./NAV_File_23122016.out"
> dat_in <- read.table(dat)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 17 elements
> dat_in <- read.csv(dat, header=FALSE)
> str(dat_in)
'data.frame': 75 obs. of 6 variables:
$ V1: Factor w/ 1 level "12/23/2016": 1 1 1 1 1 1 1 1 1 1 ...
$ V2: Factor w/ 7 levels "PFM001","PFM002",..: 1 1 1 1 1 1 1 1 1 1 ...
$ V3: Factor w/ 7 levels "HDFC PENSION MANAGEMENT COMPANY LIMITED",..: 6 6 6 6 6 6 6 6 6 6 ...
$ V4: Factor w/ 75 levels "SM001001","SM001002",..: 5 7 8 11 12 13 1 2 3 4 ...
$ V5: Factor w/ 75 levels "HDFC PENSION MANAGEMENT COMPANY LIMITED SCHEME A - TIER I",..: 62 59 63 37 56 57 54 55 60 58 ...
$ V6: num 21.7 21.1 20.8 11.7 10.1 ...
--
David.
>
> When I try to access the link
> "https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip"
> manually, then download the zip file and then unzip it, I get a file
> called "NAV_File_23122016.out". Which next I open in excel and get all
> the data.
>
> I was just trying to perform similar task, however through R, so that
> I can load data automatically directly from Web.
>
> Any Idea please. I am using below version of R (I know this is quite
> old version, however I am not currently in a position to upgrade my
> Macbook)
>
>> R.Version()
> $platform
> [1] "x86_64-apple-darwin10.8.0"
>
> $arch
> [1] "x86_64"
>
> $os
> [1] "darwin10.8.0"
>
> $system
> [1] "x86_64, darwin10.8.0"
>
> $status
> [1] ""
>
> $major
> [1] "3"
>
> $minor
> [1] "2.1"
>
> $year
> [1] "2015"
>
> $month
> [1] "06"
>
> $day
> [1] "18"
>
> $`svn rev`
> [1] "68531"
>
> $language
> [1] "R"
>
> $version.string
> [1] "R version 3.2.1 (2015-06-18)"
>
> $nickname
> [1] "World-Famous Astronaut"
>
> On Mon, Dec 26, 2016 at 7:18 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>>> On Dec 25, 2016, at 3:46 PM, Gábor Csárdi <csardi.gabor at gmail.com> wrote:
>>>
>>> Your R build does not support HTTPS.
>>>
>>> I suggest that you use the curl package if you can. HTTP support in
>>> base R is very limited currently.
>>
>> I generally use the downloader package. It sets up the call to download.file so that it succeeds with https URLs.
>>
>>
>> install.packages("downloader", dependencies=TRUE)
>> trying URL 'http://cran.cnr.Berkeley.edu/bin/macosx/mavericks/contrib/3.3/downloader_0.4.tgz'
>> Content type 'application/x-gzip' length 19459 bytes (19 KB)
>> ==================================================
>> downloaded 19 KB
>>
>>
>> The downloaded binary packages are in
>> /var/folders/68/vh2f8kzn09j8954r6q9100yh0000gn/T//Rtmpq8DVG4/downloaded_packages
>>> library(downloader)
>>> help(pac=downloader)
>> starting httpd help server ... done
>>> download("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip","temp.zip")
>>
>> # Requires both a source and destination file name.
>>
>> trying URL 'https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip'
>> Content type 'application/octet-stream' length 1228 bytes
>> ==================================================
>> downloaded 1228 bytes
>>
>> --
>> David.
>>>
>>> Gabor
>>>
>>>
>>>
>>> On Sun, Dec 25, 2016 at 10:37 PM, Christofer Bogaso
>>> <bogaso.christofer at gmail.com> wrote:
>>>> Hi again,
>>>>
>>>> I posted this in general R thread, however it is suggested this group
>>>> since I am using MAC OS 10.7.5.
>>>>
>>>> I was following the instruction available in
>>>> "http://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data"
>>>> to download data from Internet contained in a zip file from the
>>>> address :
>>>>
>>>> https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip
>>>>
>>>> However when I tried to follow the instruction I am facing below error :
>>>>
>>>>> temp <- tempfile()
>>>>> download.file("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip",temp)
>>>> Error in download.file("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip",
>>>> :
>>>> unsupported URL scheme
>>>>
>>>> Can someone here please tell me what went wrong in above?
>>>>
>>>> Highly appreciate your feedback.
>>>>
>>>> Thanks for your time.
>>>>
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> R-SIG-Mac at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
More information about the R-SIG-Mac
mailing list