[R] cannot read iso639 table
William Dunlap
wdunlap at tibco.com
Thu Sep 13 21:50:21 CEST 2012
On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out
the initial 3 bytes (the byte-order mark?) to make things work:
> socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
> readChar(socket, nchars=3, useBytes=TRUE)
[1] ""
> d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
> dim(d)
[1] 485 5
> head(d)
V1 V2 V3 V4 V5
1 aar aa Afar afar
2 abk ab Abkhazian abkhaze
3 ace Achinese aceh
4 ach Acoli acoli
5 ada Adangme adangme
6 ady Adyghe; Adygei adyghé
If I deleted no initial bytes I got
> socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
> d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
Warning messages:
1: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
2: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
> dim(d)
[1] 1 1
> str(d)
'data.frame': 1 obs. of 1 variable:
$ V1: chr "?"
If I delete the initial 2 bytes I got an "empty beginning of file" error:
> socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
> readChar(socket, nchars=2, useBytes=TRUE)
[1] "ï»"
> d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
Error in read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
empty beginning of file
In addition: Warning messages:
1: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
2: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of peter dalgaard
> Sent: Thursday, September 13, 2012 12:32 PM
> To: sds at gnu.org
> Cc: r-help at r-project.org
> Subject: Re: [R] cannot read iso639 table
>
>
> On Sep 13, 2012, at 19:42 , Sam Steingold wrote:
>
> > line 109 did not have 5 elements ... but it did!
> > empty beginning of file ... but it's not!
> >
> > details:
> > --8<---------------cut here---------------start------------->8---
> > get.language.ISO.table <- function () {
> > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",
> > open="r",encoding="utf-8");
> > data <- read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
> > col.names = c("a3bibliographic","a3terminologic",
> > "a2","english","french"));
>
> quote="" would seem to be your friend (apostrophes in the file are doing you in). I can't
> reproduce the "empty beginning" error, though.
>
>
> > close(socket);
> > data
> > }
> > language.ISO.table <- get.language.ISO.table()
>
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list