[R] cannot read iso639 table
Sam Steingold
sds at gnu.org
Thu Sep 13 19:42:05 CEST 2012
line 109 did not have 5 elements ... but it did!
empty beginning of file ... but it's not!
details:
--8<---------------cut here---------------start------------->8---
get.language.ISO.table <- function () {
socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",
open="r",encoding="utf-8");
data <- read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
col.names = c("a3bibliographic","a3terminologic",
"a2","english","french"));
close(socket);
data
}
language.ISO.table <- get.language.ISO.table()
Error in read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
col.names = c("a3bibliographic", :
empty beginning of file
--8<---------------cut here---------------end--------------->8---
the first line is _not_ blank, as one can see by downloading the
file with wget
In addition:
--8<---------------cut here---------------start------------->8---
Warning messages:
1: In read.table(socket, as.is = TRUE, sep = "|", header = FALSE, col.names = c("a3bibliographic", :
invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
--8<---------------cut here---------------end--------------->8---
what is invalid there? libreoffice calc opened the file just fine.
--8<---------------cut here---------------start------------->8---
2: In read.table(socket, as.is = TRUE, sep = "|", header = FALSE, col.names = c("a3bibliographic", :
incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
--8<---------------cut here---------------end--------------->8---
indeed the final NL is missing. why is this a big deal?
when I download the file:
--8<---------------cut here---------------start------------->8---
read.table("ISO-639-2_utf-8.csv",encoding="utf-8", as.is = TRUE,
sep = "|", header = FALSE,
col.names = c("a3bibliographic","a3terminologic",
"a2","english","french"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 109 did not have 5 elements
--8<---------------cut here---------------end--------------->8---
however
--8<---------------cut here---------------start------------->8---
> l <- readLines("ISO-639-2_utf-8.csv",encoding="utf-8")
Warning message:
In readLines("ISO-639-2_utf-8.csv", encoding = "utf-8") :
incomplete final line found on 'ISO-639-2_utf-8.csv'
> l[108:110]
[1] "dgr|||Dogrib|dogrib"
[2] "din|||Dinka|dinka"
[3] "div||dv|Divehi; Dhivehi; Maldivian|maldivien"
--8<---------------cut here---------------end--------------->8---
all lines look legit to me.
so, why can't I read the file?
thanks.
ps. ubuntu; R 2.15.1 (2012-06-22) installed from cran using aptitude.
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://dhimmi.com http://memri.org
http://ffii.org http://think-israel.org http://honestreporting.com
The past is gone, the present is ephemeral, the future is a guess.
More information about the R-help
mailing list