[R-SIG-Mac] Reading in a table originally with ISO-latin1 encoding (Linux)

Antti Arppe aarppe at ling.helsinki.fi
Fri Jun 9 20:22:49 CEST 2006


Dear developers

I asked this first on the general R-help list, and in the responses I 
received I was suggested to consult this list, as my problem appears 
to be specific to Macintosh.

I have earlier been working (and continue to work) with R in Linux, 
where reading in a table containing Scandinavian letters ("ä", "ö", 
and "å") in the header as part of variable names has not caused any 
problem whatsoever.

However, when trying to do the same in R running on a new MacOS-X 
(with an Intel processor) with the same original text table does not 
seem to work whichever way I try. Following the recommendations on the 
R site and using the 'file' function to set the encoding (which as far 
as I have been able to understand in ISO-latin1) breaks down at the 
first encounter with a Scandinavian character:

THINK <- read.table(file("R_data/hs+sfnet.T.060505.tbl4", 
encoding="latin1"),header=TRUE)
Warning messages:
1: invalid input found on input connection 'R_data/hs+sfnet.T.060505.tbl4'
2: incomplete final line found by readTableHeader on 
'R_data/hs+sfnet.T.060505.tbl4'

A sample source table exemplifying such characters as variable labels 
is below (for which the behavior of R in Mac is the same as for the 
larger file referred to above):.

    ajatella miettiä pohtia
1     FALSE   FALSE   TRUE
2     FALSE   FALSE  FALSE
3     FALSE    TRUE  FALSE
4     FALSE    TRUE  FALSE
5      TRUE   FALSE  FALSE
6      TRUE   FALSE  FALSE
7     FALSE   FALSE  FALSE
8     FALSE    TRUE  FALSE
9     FALSE    TRUE  FALSE
10    FALSE   FALSE  FALSE

Converting the file from ISO-latin1 to UTF8 (with Mac's TextEdit 
application) allows the file to be read in in its entirety, but still 
the Scandinavian character in the heading is coerced to a period '.', 
or two, in fact (i.e. 'miettiä' -> 'miett..').

Have I possibly misunderstood how the 'file' function should be used 
in conjunction with 'read.table', or might the problem with 
latin1-to-UTF-8 conversion be somewhere else? In ny mind, it would be 
most preferable if I were able to to operate with the same files in 
both MacOS-X and Linux.

Appreciating any help on this matter,

-- 
======================================================================
Antti Arppe - Master of Science (Engineering)
Researcher & doctoral student (Linguistics)
E-mail: antti.arppe at helsinki.fi
WWW: http://www.ling.helsinki.fi/~aarppe


More information about the R-SIG-Mac mailing list