[R] cannot read iso639 table

William Dunlap wdunlap at tibco.com
Thu Sep 13 21:50:21 CEST 2012


On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out
the initial 3 bytes (the byte-order mark?) to make things work:

  > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
  > readChar(socket, nchars=3, useBytes=TRUE)
  [1] ""
  > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
  > dim(d)
  [1] 485   5
  > head(d)
     V1 V2 V3             V4      V5
  1 aar    aa           Afar    afar
  2 abk    ab      Abkhazian abkhaze
  3 ace             Achinese    aceh
  4 ach                Acoli   acoli
  5 ada              Adangme adangme
  6 ady       Adyghe; Adygei  adyghé

If I deleted no initial bytes I got
  > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
  > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
  Warning messages:
  1: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
  2: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
  > dim(d)
  [1] 1 1
  > str(d)
  'data.frame':   1 obs. of  1 variable:
   $ V1: chr "?"
If I delete the initial 2 bytes I got an "empty beginning of file" error:
  > socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8")
  > readChar(socket, nchars=2, useBytes=TRUE)
  [1] "ï»"
  > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
  Error in read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) : 
    empty beginning of file
  In addition: Warning messages:
  1: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
  2: In read.table(socket, quote = "", sep = "|", stringsAsFactors = FALSE) :
    incomplete final line found by readTableHeader on 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'

  > sessionInfo()
  R version 2.15.1 (2012-06-22)
  Platform: x86_64-pc-mingw32/x64 (64-bit)
  
  locale:
  [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
  [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
  [5] LC_TIME=English_United States.1252    
  
  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of peter dalgaard
> Sent: Thursday, September 13, 2012 12:32 PM
> To: sds at gnu.org
> Cc: r-help at r-project.org
> Subject: Re: [R] cannot read iso639 table
> 
> 
> On Sep 13, 2012, at 19:42 , Sam Steingold wrote:
> 
> > line 109 did not have 5 elements ... but it did!
> > empty beginning of file ... but it's not!
> >
> > details:
> > --8<---------------cut here---------------start------------->8---
> > get.language.ISO.table <- function () {
> >  socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",
> >                open="r",encoding="utf-8");
> >  data <- read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
> >                     col.names = c("a3bibliographic","a3terminologic",
> >                       "a2","english","french"));
> 
> quote="" would seem to be your friend (apostrophes in the file are doing you in). I can't
> reproduce the "empty beginning" error, though.
> 
> 
> >  close(socket);
> >  data
> > }
> > language.ISO.table <- get.language.ISO.table()
> 
> 
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list