[R] How to download and unzip data in a loop
David Winsemius
dwinsemius at comcast.net
Thu Feb 5 22:18:48 CET 2015
On Feb 5, 2015, at 10:03 AM, Alexandra Catena wrote:
> Thank you guys for the response.
>
> I'm trying to download the last ten years of meteorology data from a
> weather station in Livermore from the URL:
> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
> The Livermore station code is 724927-23285. If I wanted to download data
> from 2005, the URL would be:
> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz
>
> Once I download the data into a temporary file, I want to unzip it and
> store it into another directory where I can access it.
>
> Also, why are there 2015 indices instead of just 10 when I'm only looping
> through 2005:2015?
When you assign to file[2005], R fills in the positions from 1 to 2004 with NA's, and then adds to that vector with each further run through the loop.
The quotes around 'files' are preventing evaluation of your (very poorly named) 'files'-object.
The error I get after correcting those semantic errors is:
> read.table(gzfile(files))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 17 elements
... thus validating Jon's warning.
>
> Thanks,
> Alexandra
>
> On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien <jon.skoien at jrc.ec.europa.eu>
> wrote:
>
>> In addition to following Jim's suggestion, you should probably also use
>> full.names = TRUE, otherwise you will try to open a connection to files in
>> your current directory, not in tmpdir.
>> Another thing is that the unzipped files appear irregular with respect to
>> columns, so read.table might not work too well.
>>
>> Jon
>>
>>
>> On 2/5/2015 11:30 AM, jim holtman wrote:
>>
>>> try taking the quotes off of 'files'
>>>
>>>
>>> Jim Holtman
>>> Data Munger Guru
>>>
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>>
>>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at gmail.com>
>>> wrote:
>>>
>>> Hi All,
>>>>
>>>> I need to loop through and download the past 10 years of met data to a
>>>> temporary directory. I then need to unzip it and place it into another
>>>> directory.
>>>>
>>>>
>>>> year = (2005:2015)
>>>>
>>>> for (i in year)
>>>> tmpdir = tempdir()
>>>> file[i] = file.path(tmpdir, sprintf('724927-23285-%4i.gz', i))
>>>> url = sprintf('
>>>> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i, i)
>>>> #file = basename(url)
>>>> download.file(url, file[i])
>>>> files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>> read.table(gzfile('files'))
>>>>
>>>>
>>>>
>>>> 'file' returns 2015 indices with "/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>>> next to 2015. and files returns 724927-23285-2015.gz. However, when I
>>>> try
>>>> to unzip the gz file using the last line, it says it cannot open the
>>>> connection and the probable reason is that there is no such file or
>>>> directory.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Alexandra
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list