[R] how can I import op.gz files with read.csv or otherwise

Rui Barradas ruipbarradas at sapo.pt
Fri Dec 21 22:31:19 CET 2012


Hello,

It can be read using readLines. I've changed url to URL because there's 
a function of that name.
I've also changed dest.

URL <- "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz"
dest <- "weather.op.gz"
download.file(URL, dest)
gz <- gzfile(dest, open = "rt")
x <- readLines(gz)
close(gz)
x


Like you say, headers and data are not consistent, it seems some column 
headers are missing.

Hope this helps,

Rui Barradas

Em 21-12-2012 17:44, John Kane escreveu:
> Try downloading it and decompress it:
>
>    url <- "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz"
>    dest <- "/home/john/rdata/weather.op.gz"
>    download.file(url, dest)
>
> However it does not look like a nicely formatted file and you may have to do some cleanup in a text editior or perhaps load it into a spreadsheet before you read it into R.
>
> I tried the method from the link arun provided and it did not work.  It looks like the headers and data are not consistant
> John Kane
> Kingston ON Canada
>
>
>> -----Original Message-----
>> From: herrdittmann at yahoo.co.uk
>> Sent: Fri, 21 Dec 2012 14:51:05 +0000 (GMT)
>> To: r-help at r-project.org
>> Subject: [R] how can I import op.gz files with read.csv or otherwise
>>
>> Dear R-users,
>>
>> I am struggling to directly read an "op.gz" file into R. NOAA kindly
>> provides daily weather data on their FTP server for download.
>>
>>
>>> sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252B  LC_CTYPE=English_United
>> Kingdom.1252B B B  LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=CB B B B B B B B B B B B B B B B B B B B B B B B B B B
>> LC_TIME=English_United Kingdom.1252B B  B
>> attached base packages:
>> [1] statsB B B B  graphicsB  grDevices utilsB B B B  datasetsB  methodsB
>> B  baseB B B  B
>> loaded via a namespace (and not attached):
>> [1] tools_2.15.1
>>
>> Here is the data set in question:
>> x <-
> read.csv(file="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz",
>> skip = 1, sep = "")
>>
>> and "structure" returns some incomprehensible gibberish:
>>
>>> str(x)
>> 'data.frame':B B  70 obs. ofB  6 variables:
>> B $ X4C tYd...C?C?8.C%WD...C+C<.C/C?.X.QTC1VP..B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B  : Factor w/
>> 70 levels "\005~4C(dE?C?C-E\031y\020C?C-B:\035JC+C/C?IK?C?C-\021b?
>> BC?C?R{E>Ykv\035`b??C,B0a\017Z\021C?sPb??eC?hC?C?""| __truncated__,..: 44
>> 13 56 64 28 23 67 3 2 33 ...
>> B $ X.C6oMC=C?T..C?B:_...C?C-gC?7.C"..TC>.C<C?.B*...5C?J...E>C-C?j.QC
>> ..C#.eC?F?C?GmC2C=C'B:g..a...C?C&.J..C?.C#sC'.C*E .C;.C
>> klsUDC?.E ..C"U...u1.zC?.WC?..x...3._..E.C2.C
>> ZD.C/oC?C?.dvC&....C2k.C.y...8h: Factor w/ 41 levels
> "","\025C'\016\vC?B2i;B'4F?C1\002\001Pb??C!\0025B6b?"C"C?{C?C?b?"B9=C?B$4&C0w\\\\QB-B4b?:B-C?B4\"hnC?b?"b?0IB(C?E
>> b*B*E!\035b\b>6C?C?$W!C?C?R=B(\022b??PqK?[B;j\004$TC?b?"3B2*C?B1%C
>> B-Nb??b?:\"| __truncated__,..: 1 1 1 39 1 1 1 8 1 5 ...
>> B $ X.iE?.yC?C;..C=2.h..C;C
>> 7.C#J.3k..jLm...Q..uYC?JC$.K.zkU.8.C6C?..Y.7.3...C<C
>> C.A.C
>> .3C;C?..Z..5...C"E!.B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B  : Factor w/ 29 levels "","\001qC?^+nC*1",..: 1 1 1 4 1 1 1 14 1 6
>> ...
>> B $ X.C.Fd.m.E ..v.B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> : Factor w/ 17 levels "","B B4SL\aE!C?C?B0\035d)b??B<$C-ZC?B<C2B6C.C?JC
>> b??C$C>b? B6C%\006%5l[b??B<E!\025a\024C1C/+gT+3",..: 1 1 1 16 1 1 1 1 1 1
>> ...
>> B $ C?A..E.E8JkEZ.C?C
>> .C!.C+.......E?.z..C?.z..E?..C2C?.C?C?C#g.C6B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
>> B B B B B B B B B B B B B B B  : Factor w/ 8 levels
>> "","\001S\177B?\017iSC?C?iC?C?#\017\"UgC?:iB4C-\016pC?\031UC?C)D""|
>> __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
>> B $ X..oC0C'nPC?oC?WC5j.C)C?..B..C?C?C+.QC*C9...C8B B B B  B
>>
>>
>> While I can manually open and read the op.gz file in a text editor,
>> read.csv() or read.table() the imported file is simply unreadable.
>>
>> How can I best get the job done? Any pointers, suggestions, ideas most
>> welcome!!
>>
>> Thanks in advance!
>>
>> Bernd
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ____________________________________________________________
> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list