[R] cannot read iso639 table

Sam Steingold sds at gnu.org
Thu Sep 13 19:42:05 CEST 2012


line 109 did not have 5 elements ... but it did!
empty beginning of file ... but it's not!

details:
--8<---------------cut here---------------start------------->8---
get.language.ISO.table <- function () {
  socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",
                open="r",encoding="utf-8");
  data <- read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
                     col.names = c("a3bibliographic","a3terminologic",
                       "a2","english","french"));
  close(socket);
  data
}
language.ISO.table <- get.language.ISO.table()

Error in read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
  col.names = c("a3bibliographic", : 
  empty beginning of file
--8<---------------cut here---------------end--------------->8---
the first line is _not_ blank, as one can see by downloading the
file with wget
  
In addition:
--8<---------------cut here---------------start------------->8---
Warning messages:
1: In read.table(socket, as.is = TRUE, sep = "|", header = FALSE, col.names = c("a3bibliographic",  :
  invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
--8<---------------cut here---------------end--------------->8---
what is invalid there? libreoffice calc opened the file just fine.

--8<---------------cut here---------------start------------->8---
2: In read.table(socket, as.is = TRUE, sep = "|", header = FALSE, col.names = c("a3bibliographic",  :
  incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
--8<---------------cut here---------------end--------------->8---
indeed the final NL is missing. why is this a big deal?

when I download the file:

--8<---------------cut here---------------start------------->8---
read.table("ISO-639-2_utf-8.csv",encoding="utf-8", as.is = TRUE,
           sep = "|", header = FALSE,
            col.names = c("a3bibliographic","a3terminologic",
                       "a2","english","french"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 109 did not have 5 elements
--8<---------------cut here---------------end--------------->8---

however
--8<---------------cut here---------------start------------->8---
> l <- readLines("ISO-639-2_utf-8.csv",encoding="utf-8")
Warning message:
In readLines("ISO-639-2_utf-8.csv", encoding = "utf-8") :
  incomplete final line found on 'ISO-639-2_utf-8.csv'
> l[108:110]
[1] "dgr|||Dogrib|dogrib"                         
[2] "din|||Dinka|dinka"                           
[3] "div||dv|Divehi; Dhivehi; Maldivian|maldivien"
--8<---------------cut here---------------end--------------->8---
all lines look legit to me.

so, why can't I read the file?

thanks.

ps. ubuntu; R 2.15.1 (2012-06-22) installed from cran using aptitude.
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://dhimmi.com http://memri.org
http://ffii.org http://think-israel.org http://honestreporting.com
The past is gone, the present is ephemeral, the future is a guess.



More information about the R-help mailing list