[R-SIG-Mac] Reading in a table originally with ISO-latin1 encoding (Linux)
Antti Arppe
aarppe at ling.helsinki.fi
Fri Jun 9 20:22:49 CEST 2006
Dear developers
I asked this first on the general R-help list, and in the responses I
received I was suggested to consult this list, as my problem appears
to be specific to Macintosh.
I have earlier been working (and continue to work) with R in Linux,
where reading in a table containing Scandinavian letters ("ä", "ö",
and "å") in the header as part of variable names has not caused any
problem whatsoever.
However, when trying to do the same in R running on a new MacOS-X
(with an Intel processor) with the same original text table does not
seem to work whichever way I try. Following the recommendations on the
R site and using the 'file' function to set the encoding (which as far
as I have been able to understand in ISO-latin1) breaks down at the
first encounter with a Scandinavian character:
THINK <- read.table(file("R_data/hs+sfnet.T.060505.tbl4",
encoding="latin1"),header=TRUE)
Warning messages:
1: invalid input found on input connection 'R_data/hs+sfnet.T.060505.tbl4'
2: incomplete final line found by readTableHeader on
'R_data/hs+sfnet.T.060505.tbl4'
A sample source table exemplifying such characters as variable labels
is below (for which the behavior of R in Mac is the same as for the
larger file referred to above):.
ajatella miettiä pohtia
1 FALSE FALSE TRUE
2 FALSE FALSE FALSE
3 FALSE TRUE FALSE
4 FALSE TRUE FALSE
5 TRUE FALSE FALSE
6 TRUE FALSE FALSE
7 FALSE FALSE FALSE
8 FALSE TRUE FALSE
9 FALSE TRUE FALSE
10 FALSE FALSE FALSE
Converting the file from ISO-latin1 to UTF8 (with Mac's TextEdit
application) allows the file to be read in in its entirety, but still
the Scandinavian character in the heading is coerced to a period '.',
or two, in fact (i.e. 'miettiä' -> 'miett..').
Have I possibly misunderstood how the 'file' function should be used
in conjunction with 'read.table', or might the problem with
latin1-to-UTF-8 conversion be somewhere else? In ny mind, it would be
most preferable if I were able to to operate with the same files in
both MacOS-X and Linux.
Appreciating any help on this matter,
--
======================================================================
Antti Arppe - Master of Science (Engineering)
Researcher & doctoral student (Linguistics)
E-mail: antti.arppe at helsinki.fi
WWW: http://www.ling.helsinki.fi/~aarppe
More information about the R-SIG-Mac
mailing list