[R] How to download and unzip data in a loop

David Winsemius dwinsemius at comcast.net
Thu Feb 5 22:18:48 CET 2015


On Feb 5, 2015, at 10:03 AM, Alexandra Catena wrote:

> Thank you guys for the response.
> 
> I'm trying to download the last ten years of meteorology data from a
> weather station in Livermore from the URL:
> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
> The Livermore station code is 724927-23285.  If I wanted to download data
> from 2005, the URL would be:
> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz
> 
> Once I download the data into a temporary file, I want to unzip it and
> store it into another directory where I can access it.
> 
> Also, why are there 2015 indices instead of just 10 when I'm only looping
> through 2005:2015?

When you assign to file[2005], R fills in the positions from 1 to 2004 with NA's, and then adds to that vector with each further run through the loop.

The quotes around 'files' are preventing evaluation of your (very poorly named) 'files'-object.

The error I get after correcting those semantic errors is:

>   read.table(gzfile(files))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 17 elements

... thus validating Jon's warning.


> 
> Thanks,
> Alexandra
> 
> On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien <jon.skoien at jrc.ec.europa.eu>
> wrote:
> 
>> In addition to following Jim's suggestion, you should probably also use
>> full.names = TRUE, otherwise you will try to open a connection to files in
>> your current directory, not in tmpdir.
>> Another thing is that the unzipped files appear irregular with respect to
>> columns, so read.table might not work too well.
>> 
>> Jon
>> 
>> 
>> On 2/5/2015 11:30 AM, jim holtman wrote:
>> 
>>> try taking the quotes off of 'files'
>>> 
>>> 
>>> Jim Holtman
>>> Data Munger Guru
>>> 
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>> 
>>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at gmail.com>
>>> wrote:
>>> 
>>> Hi All,
>>>> 
>>>> I need to loop through and download the past 10 years of met data to a
>>>> temporary directory.  I then need to unzip it and place it into another
>>>> directory.
>>>> 
>>>> 
>>>> year = (2005:2015)
>>>> 
>>>> for (i in year)
>>>>   tmpdir = tempdir()
>>>>   file[i] = file.path(tmpdir, sprintf('724927-23285-%4i.gz', i))
>>>>   url = sprintf('
>>>> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i, i)
>>>>   #file = basename(url)
>>>>   download.file(url, file[i])
>>>>   files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>>   read.table(gzfile('files'))
>>>> 
>>>> 
>>>> 
>>>> 'file' returns 2015 indices with "/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>>> next to 2015. and files returns 724927-23285-2015.gz.  However, when I
>>>> try
>>>> to unzip the gz file using the last line, it says it cannot open the
>>>> connection and the probable reason is that there is no such file or
>>>> directory.
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Alexandra
>>>> 
>>>>         [[alternative HTML version deleted]]
>>>> 
>>>> 


David Winsemius
Alameda, CA, USA



More information about the R-help mailing list